有没有办法使用groupby来计算文本列的平均值？

Question

我一直在使用pandas.groupby来透视数据，并为我的数据创建描述性图表和表格。当对三个变量进行groupby时，我一直遇到了一个问题 DataError: No numeric types to aggregate 工作时出错 cancelled 列。

来描述我的数据。Year 和 Month 包含多列的年度和月度数据（多个年份，所有月份）。Type 包含订单项目的类型（衣服、家电等），以及 cancelled 包含是或否的字符串值，以确定一个订单是否被取消。

我希望绘制一个图表，并显示一个表格，以显示订单项目的取消率（和成功率）是多少。以下是我目前正在使用的方法

df.groupby(['Year', 'Month', 'Type'])['cancelled'].mean()

但这似乎并不奏效。

样品

Year    Month        Type          cancelled 
2012      1        electronics       yes
2012      10         fiber           yes
2012      9         clothes          no
2013      4        vegetables        yes
2013      5        appliances        no
2016      3        fiber             no
2017      1        clothes           yes

Answer 1

使用:

df = pd.DataFrame({
         'Year':[2020] * 6,
         'Month':[7,8,7,8,7,8],
         'cancelled':['yes','no'] * 3,
         'Type':list('aaaaba')
})
print (df)

获取每个变量的计数 Year, Month, Type 列。

df1 = df.groupby(['Year', 'Month', 'Type','cancelled']).size().unstack(fill_value=0)
print (df1)
cancelled        no  yes
Year Month Type         
2020 7     a      0    2
           b      0    1
     8     a      3    0

然后除以数值之和作为比率。

df2 = df1.div(df1.sum()).mul(100)
print (df2)
cancelled           no        yes
Year Month Type                  
2020 7     a       0.0  66.666667
           b       0.0  33.333333
     8     a     100.0   0.000000

Answer 2

可能我误解了你想要的输出是什么样子的但要找到每个项目类型的取消率你可以这样做：

# change 'cancelled' to numeric values
df.loc[df['cancelled'] == 'yes', 'cancelled'] = 1
df.loc[df['cancelled'] == 'no', 'cancelled'] = 0

# get the mean of 'cancelled' for each item type
res = {}
for t in df['Type'].unique():
    res[t] = df.loc[df['Type'] == t, 'cancelled'].mean()

# if desired, put it into a dataframe
results = pd.DataFrame([res], index=['Rate']).T

输出

              Rate
electronics   1.0
fiber         0.5
clothes       0.5
vegetables    1.0
appliances    0.0

注：如果你想指定具体的年份或月份，你可以用以下方法来实现： loc 也是，但鉴于你的示例数据在给定的年或月内没有任何重复，这将返回你给定示例的原始数据框。

有没有办法使用groupby来计算文本列的平均值？

问题描述投票：0回答：1

1个回答

最新问题

有没有办法使用groupby来计算文本列的平均值？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1