如果我有一个类似的数据框
In [58]: arrays = [
....: ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
....: ["one1", "one2", "one1", "one2", "one1", "two", "one1", "two"],
....: ]
....:
In [59]: index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
In [60]: df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)
In [61]: df
Out[61]:
A B
first second
bar one1 1 0
one2 1 1
baz one1 1 2
one2 1 3
foo one1 2 4
two 2 5
qux one1 3 6
two 3 7
那我想要一个结果
Out[61]:
Out[61]:
A B
first second
bar one 2 1
baz one 2 5
foo one 2 4
two 2 5
qux one 3 6
two 3 7
也就是说,当'level=0'时,我想分组并保留其自己的索引,同时,当'level=1'时,我想像
lambda x: x.startwith(x[:3])
一样分组,并将新索引放入 x[:3]
。
那么是不是只能通过groupby
的声明来实现?或者其他方式呢?
您可以使用:
df = df.reset_index()
df['second']= df['second'].str[:3]
df2 = df.groupby(['first', 'second'])[['A', 'B']].sum()
print(df2)
给出:
A B
first second
bar one 2 1
baz one 2 5
foo one 2 4
two 2 5
qux one 3 6
two 3 7