我有以下数据框:
ID customer Month Amount
0 026 201707 31,65
1 026 201708 31,65
2 026 201709 31,65
3 026 201710 31,65
4 026 201711 31,65
.....
其中'Amount'是object类型。我想为每个ID计算sum和average金额。首先,我尝试将“金额”列从object转换为float
df ['Amount'] = pd.to_numeric(df ['Amount'],errors ='coerce')
但是我在“金额”列中获得了所有值的NaN
ID customer Month Amount
0 026 201707 NaN
....
如何将列对象类型转换为带实数的浮点数并汇总每个客户的值(总和,平均值,均值)?
用str.replace
用点替换那些逗号:
str.replace
然后分组并根据需要计算聚合(不确定所需的输出类型,但基本上希望如下):
df['Amount'] = pd.to_numeric(df.Amount.str.replace(',','.'), errors='coerce')
print(df)
ID customer Month Amount
0 0 26 201707 31.65
1 1 26 201708 31.65
2 2 26 201709 31.65
3 3 26 201710 31.65
4 4 26 201711 31.65
在df.groupby('ID').Amount.mean()
之前使用Series.str.replace
在Series.str.replace
中转换pd.to_numeric
,然后可以使用,
.
如果要聚合到初始数据帧,请使用groupby.agg
:
agg_df = (df.assign(Amount = pd.to_numeric(df['Amount'].str.replace(',','.'),
errors = 'coerce'))
.groupby('ID').Amount.agg(['mean','sum']))
print(agg_df)
#if you want change the type of Amount previously
#df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
#agg_df = df.groupby('ID').Amount.agg(['mean','sum']))
mean sum
ID
0 31.65 31.65
1 31.65 31.65
2 31.65 31.65
3 31.65 31.65
4 31.65 31.65