如何对不同级别上具有不同规则的多索引数据框进行分组?

问题描述 投票:0回答:1

如果我有一个类似的数据框

In [58]: arrays = [
   ....:     ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
   ....:     ["one1", "one2", "one1", "one2", "one1", "two", "one1", "two"],
   ....: ]
   ....: 

In [59]: index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])

In [60]: df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)

In [61]: df
Out[61]: 
              A  B
first second      
bar   one1     1  0
      one2     1  1
baz   one1     1  2
      one2     1  3
foo   one1     2  4
      two     2  5
qux   one1     3  6
      two     3  7

那我想要一个结果

Out[61]: 
Out[61]: 
              A  B
first second      
bar   one     2  1
baz   one     2  5
foo   one     2  4
      two     2  5
qux   one     3  6
      two     3  7

也就是说,当'level=0'时,我想分组并保留其自己的索引,同时,当'level=1'时,我想像

lambda x: x.startwith(x[:3])
一样分组,并将新索引放入
 x[:3]
。 那么是不是只能通过
groupby
的声明来实现?或者其他方式呢?

python dataframe group-by
1个回答
0
投票

您可以使用:

df = df.reset_index()
df['second']= df['second'].str[:3]
df2 = df.groupby(['first', 'second'])[['A', 'B']].sum()
print(df2)

给出:

              A  B
first second      
bar   one     2  1
baz   one     2  5
foo   one     2  4
      two     2  5
qux   one     3  6
      two     3  7
© www.soinside.com 2019 - 2024. All rights reserved.