我有一个数据框构建
fromRecords
一个django查询集,我将其旋转两列以获取它的仪表板视图。
我设法获得整个表的行和列的全局总和,但我试图通过第一个数据透视列获得总和(每组第一列的行小计)。
我对熊猫一无所知,但我正在学习。
我的数据框看起来像:
type amount source fund
0 Ressource Humaine CDD -36470.36 Expense fund2
1 Mission -1686.47 Expense fund2
2 Fonctionnement -817465.91 Expense fund1
3 Fonctionnement 1118691.65 Budget fund1
4 Fonctionnement -6000 Expense fund3
5 Fonctionnement -23621.83 Expense fund2
6 Frais de Gestion -53499 Expense fund2
7 Fonctionnement 15000 Budget fund3
8 Frais de Gestion 53499 Budget fund2
9 Fonctionnement 186718.78 Budget fund2
10 Mission 1686.47 Budget fund2
1 Ressource Humaine CDD 38676.53 Budget fund2
为了在仪表板中了解资金可用性的概述,我将其旋转如下:
piv=cpd.pivot_table(index="type", columns=["fund","source"], values="amount", aggfunc='sum', margins=True, margins_name='Sum')
获得:
fund fund1 fund2 fund3
source Budget Expense Budget Expense Budget Expense
type
Fonctionnement 1118691.65 -817465.91 186718.78 -23621.83 15000.00 -6000.00
Frais de Gestion NaN NaN 53499.00 -53499.00 NaN NaN
Mission NaN NaN 1686.47 -1686.47 NaN NaN
Ressource Humaine CDD NaN NaN 38676.53 -36470.36 NaN NaN
(这里缺少总数,但我已经得到了)
我想登陆类似的东西:
fund fund1 fund2 fund3
source Budget Expense total fund1 Budget Expense total fund2 Budget Expense total fund3
type
Fonctionnement 1118691.65 -817465.91 301 226€ 186718.78 -23621.83 163 097€ 15000.00 -6000.00 9 000€
Frais de Gestion NaN NaN NaN 53499.00 -53499.00 0 NaN NaN NaN
Mission NaN NaN NaN 1686.47 -1686.47 0 NaN NaN NaN
Ressource Humaine CDD NaN NaN NaN 38676.53 -36470.36 2 207€ NaN NaN NaN
我已经看到了一些使用 pandas concat 进行多索引数据透视的提示(例如:Pandas 中的数据透视表小计)
我正在尝试按列循环或读取标题或...但我可以走得更远,因为我是个菜鸟!
我如何插入/附加一个带有总和的中间列,以及如何计算这个子总和?
您可以进行正常的数据透视,然后计算/追加总和:
# do a normal pivot
df = df.pivot_table(
index="type",
columns=["fund", "source"],
values="amount",
aggfunc="sum",
)
# compute "sum" dataframes
dfs = []
for c in df.columns.get_level_values(0).unique():
s = df.loc[:, c].sum(axis=1, skipna=False)
dfs.append(pd.DataFrame(s, index=s.index, columns=[(c, f"Total {c}")]))
# concat them together, sort the columns:
out = pd.concat([df, pd.concat(dfs, axis=1)], axis=1)
out = out[sorted(out.columns)]
print(out)
打印:
fund fund1 fund2 fund3
source Budget Expense Total fund1 Budget Expense Total fund2 Budget Expense Total fund3
type
Fonctionnement 1118691.65 -817465.91 301225.74 186718.78 -23621.83 163096.95 15000.0 -6000.0 9000.0
Frais de Gestion NaN NaN NaN 53499.00 -53499.00 0.00 NaN NaN NaN
Mission NaN NaN NaN 1686.47 -1686.47 0.00 NaN NaN NaN
Ressource Humaine CDD NaN NaN NaN 38676.53 -36470.36 2206.17 NaN NaN NaN
我还使用了添加总和行的方法。但有一些额外的功能:
col = {'a':'Column A','b':'Column B','c':'Column C', 'v1':'Value 1', 'v2':'Value 2'}
# Create Pivot Table
r1 = pd.pivot_table(df, index=[col['a'],col['b'],col['c']], values=[col['v1'],col['v2']],
margins=True, margins_name='All', aggfunc='sum')
# Create 1st-level Sums
r1s2 = r1.drop('All', level=0).groupby([col['a']]).sum()
.assign(**{col['b']:'subtotal', col['c']:'(A)'})
.set_index([col['b'],col['c']], append=True)
# Create second-level Sums
r1s = r1.drop('All', level=0).groupby([col['a'],col['b']]).sum()
.assign(**{col['c']:'subtotal (B)'})
.set_index(col['c'], append=True)
# Merge three results together
r1 = pd.concat([r1,r1s2,r1s]).sort_index()