对数据帧进行分组以查找列上的计数和总和

Question

我有一个类似的数据框

       customer        fruit    price
0      cust1           mango     30
1      cust2           apple     45
2      cust1           banana    55
3      cust3           mango     22
4      cust4           banana    54
5      cust3           apple     55
6      cust2           apple     90
7      cust1           mango     45
8      cust3           banana    45
9      cust2           mango     23
10     cust4           mango     44

我需要了解每个客户在购买芒果和其他水果（即不是芒果本身作为一个类别）上花费了多少钱，并再次计算每个客户的购买量，将芒果作为其自己的类别，并将其他水果放在单独的列中。比如：

      customer   price spent_on_mango  spent_on_others
0      cust1          75                    55   
1      cust2          23                    135       
2      cust3          22                    100
3      cust4          44                    54

Answer 1

为什么不创建一个列来指示该水果是否是芒果，然后将其包含在您的

groupby

中？

df['mango'] = df.fruit == 'mango'
df2 = df.groupby(['customer', 'mango']).sum().unstack()
df2.columns = ['not mango', 'mango']

>>> df2
          not mango  mango
customer                  
cust1            55     75
cust2           135     23
cust3           100     22
cust4            54     44

Answer 2

我们可以将 'fruit' 中不是 'mango' 的元素替换为 'others'，然后

groupby

变量（'customer', 'fruit'），得到

sum

和

unstack

。

import pandas as pd
df1.loc[df1.fruit !='mango', 'fruit'] = 'others'
print(df1.groupby(['customer', 'fruit']).sum().unstack()) 
#         price       
#fruit    mango others
#customer             
#cust1       75     55
#cust2       23    135
#cust3       22    100
#cust4       44     54

Answer 3

另一种

pandas

方法：

df.fruit[df.fruit != 'mango'] = 'other_fruit'
pd.pivot_table(df, 'price', 'customer', 'fruit', np.sum)

fruit     mango  other_fruit
customer                    
cust1        75           55
cust2        23          135
cust3        22          100
cust4        44           54

Answer 4

作为替代方案，您可以将其作为

pivot_table

:

In [11]: res = df.pivot_table("price", "customer", "fruit", fill_value=0)

In [12]: res
Out[12]:
fruit     apple  banana  mango
customer
cust1       0.0      55   37.5
cust2      67.5       0   23.0
cust3      55.0      45   22.0
cust4       0.0      54   44.0

这可能已经足够好了，但您可以创建所需的“非芒果”：

In [13]: mango = res.pop("mango")

In [14]: res.sum(axis=1).to_frame(name="not mango").join(mango)
Out[14]:
          not mango  mango
customer
cust1          55.0   37.5
cust2          67.5   23.0
cust3         100.0   22.0
cust4          54.0   44.0

通常，如果您看到堆栈/取消堆栈，您应该尝试“旋转”:)。

Answer 5

尝试对某些列进行分组，然后像这样应用 sum() ：

print dframe.groupby(["customer","fruit"]).sum()

就像命令本身所说的那样，它将列分组并将值相加。

它返回一个包含您需要的信息的数据帧。

对数据帧进行分组以查找列上的计数和总和

问题描述投票：0回答：5

5个回答

最新问题

对数据帧进行分组以查找列上的计数和总和

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5