我有这个df:
group owner failed granted_pe slots
0 g1 u1 0 single 1
1 g50 u92 0 shared 8
2 g50 u92 0 shared 1
可以使用以下代码创建df
:
df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
['g50', 'u92', '0', 'shared', '8'],
['g50', 'u92', '0', 'shared', '1']],
columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))
print(df)
使用groupby我创建了在“slots”列上计算的三列:
df_calculated = pd.concat([
df.loc[:,['group', 'slots']].groupby(['group']).sum(),
df.loc[:,['group', 'slots']].groupby(['group']).mean(),
df.loc[:,['group', 'slots']].groupby(['group']).max()
], axis=1)
print(df_calculated)
slots slots slots
group
g1 1 1.0 1
g50 9 4.5 8
问题1:适当地命名新列 我可以在concat中添加一个参数来命名这些列“slots_sum”,“slots_avg”和“slots_max”吗?
问题2:向df添加列 我更愿意将新列添加到“源”列右侧的df(在本例中为“slots”)。期望的输出看起来像这样:
group owner failed granted_pe slots slots_sum slots_avg slots_max
0 g1 u1 0 single 1 1 1.0 1
1 g50 u92 0 shared 8 9 4.5 8
2 g50 u92 0 shared 1
我的实际df是4.5密耳行,23列。我想为其他专栏做类似的事情。
使用agg
与add_prefix
然后merge
它回来
yourdf=df.merge(df.groupby('group')['slots'].agg(['sum','mean','max']).add_prefix('slots_').reset_index(),how='left')
Out[86]:
group owner failed ... slots_sum slots_mean slots_max
0 g1 u1 0 ... 1 1.0 1
1 g50 u92 0 ... 9 4.5 8
2 g50 u92 0 ... 9 4.5 8
另一种方法是在pd.concat中使用keys
参数,然后合并多索引列标题
df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
['g50', 'u92', '0', 'shared', '8'],
['g50', 'u92', '0', 'shared', '1']],
columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))
df_calculated = pd.concat([
df.loc[:,['group', 'slots']].groupby(['group']).sum(),
df.loc[:,['group', 'slots']].groupby(['group']).mean(),
df.loc[:,['group', 'slots']].groupby(['group']).max()
], axis=1, keys=['sum','mean','max'])
df_calculated.columns = [f'{j}_{i}' for i,j in df_calculated.columns]
print(df_calculated)
输出:
slots_sum slots_mean slots_max
group
g1 1 1.0 1
g50 9 4.5 8