我试图添加一列新的小计和一列最后的总计。例如:
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
"C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
即
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
现在,我枢轴,
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
然后把总数相加
table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
table[t, "partial_total"] = table[t].sum(axis=1)
虽然这在数字上是可行的,但在视觉上是很烦人的。我希望有所有的数据 D
(包括 partial_total
),那么 E
那么 total
. 这是我的结果DF。
D E total D E
C large small large small partial_total partial_total
A
bar 5.5 5.500000 7.5 8.500000 27.000000 11.000000 16.000000
foo 2.0 2.333333 4.5 4.333333 13.166667 4.333333 8.833333
所以
我如何将同一列(顶层)的值分组?
前试图进行操作。pivot_table
g = df.groupby(['A', 'C'])[['D', 'E']]
d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')
final = pd.concat([m, d]).pivot_table(index='A', columns='C')
D E
C large small partial large small partial
A
bar 5.5 5.500000 11.000000 7.5 8.500000 16.000000
foo 2.0 2.333333 4.333333 4.5 4.333333 8.833333
具体回答你的最后一个问题
如何将同一列(顶层)的值分组?
您可以只用 sort_index
table.sort_index(axis=1)
D E total
C large partial_total small large partial_total small
A
bar 5.5 11.000000 5.500000 7.5 16.000000 8.500000 27.000000
foo 2.0 4.333333 2.333333 4.5 8.833333 4.333333 13.166667
你可以用 margin
:
new_df = (df.pivot_table(index='A', columns='C',
values=['D','E'], aggfunc='sum',
margins=True, margins_name='partial_total')
.assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)
输出。
D E total
C large small partial_total large small partial_total
A
bar 11 11 22 15 17 32 54
foo 4 7 11 9 13 22 33
partial_total 15 18 33 24 30 54 87
试用以下方法: pd.concat
:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'{i}_{j}' for i, j in table.columns]
pd.concat([table,
table.sum(axis=1, level=0).add_suffix('_partial_total'),
table.sum(axis=1).to_frame(name='total')], axis=1)
产出:
D_large D_small E_large E_small D_large_partial_total D_small_partial_total E_large_partial_total E_small_partial_total total
A
bar 5.5 5.500000 7.5 8.500000 5.5 5.500000 7.5 8.500000 27.000000
foo 2.0 2.333333 4.5 4.333333 2.0 2.333333 4.5 4.333333 13.166667