Pandas:在多列透视中添加子行总和

问题描述 投票:0回答:2

我有一个数据框构建

fromRecords
一个django查询集,我将其旋转两列以获取它的仪表板视图。 我设法获得整个表的行和列的全局总和,但我试图通过第一个数据透视列获得总和(每组第一列的行小计)。

我对熊猫一无所知,但我正在学习。

我的数据框看起来像:

    type                    amount      source  fund
0   Ressource Humaine CDD   -36470.36   Expense fund2
1   Mission                 -1686.47    Expense fund2
2   Fonctionnement          -817465.91  Expense fund1
3   Fonctionnement          1118691.65  Budget  fund1
4   Fonctionnement          -6000       Expense fund3
5   Fonctionnement          -23621.83   Expense fund2
6   Frais de Gestion        -53499      Expense fund2
7   Fonctionnement          15000       Budget  fund3
8   Frais de Gestion        53499       Budget  fund2
9   Fonctionnement          186718.78   Budget  fund2
10  Mission                 1686.47     Budget  fund2
1   Ressource Humaine CDD   38676.53    Budget  fund2


为了在仪表板中了解资金可用性的概述,我将其旋转如下:

piv=cpd.pivot_table(index="type", columns=["fund","source"], values="amount", aggfunc='sum', margins=True, margins_name='Sum')

获得:

fund                    fund1                   fund2                       fund3
source                  Budget      Expense     Budget      Expense         Budget      Expense
type
Fonctionnement          1118691.65  -817465.91  186718.78   -23621.83       15000.00    -6000.00
Frais de Gestion        NaN         NaN         53499.00    -53499.00       NaN         NaN
Mission                 NaN         NaN         1686.47     -1686.47        NaN         NaN
Ressource Humaine CDD   NaN         NaN         38676.53    -36470.36       NaN         NaN

(这里缺少总数,但我已经得到了)

我想登陆类似的东西:

fund                    fund1                                       fund2                                   fund3
source                  Budget      Expense         total fund1     Budget      Expense     total fund2     Budget      Expense     total fund3
type
Fonctionnement          1118691.65  -817465.91      301 226€        186718.78   -23621.83   163 097€        15000.00    -6000.00    9 000€
Frais de Gestion        NaN         NaN             NaN             53499.00    -53499.00   0               NaN         NaN         NaN 
Mission                 NaN         NaN             NaN             1686.47     -1686.47    0               NaN         NaN         NaN
Ressource Humaine CDD   NaN         NaN             NaN             38676.53    -36470.36   2 207€          NaN         NaN         NaN

我已经看到了一些使用 pandas concat 进行多索引数据透视的提示(例如:Pandas 中的数据透视表小计

我正在尝试按列循环或读取标题或...但我可以走得更远,因为我是个菜鸟!

我如何插入/附加一个带有总和的中间列,以及如何计算这个子总和?

python pandas pivot subtotal
2个回答
0
投票

您可以进行正常的数据透视,然后计算/追加总和:

# do a normal pivot
df = df.pivot_table(
    index="type",
    columns=["fund", "source"],
    values="amount",
    aggfunc="sum",
)

# compute "sum" dataframes
dfs = []
for c in df.columns.get_level_values(0).unique():
    s = df.loc[:, c].sum(axis=1, skipna=False)
    dfs.append(pd.DataFrame(s, index=s.index, columns=[(c, f"Total {c}")]))

# concat them together, sort the columns:
out = pd.concat([df, pd.concat(dfs, axis=1)], axis=1)
out = out[sorted(out.columns)]
print(out)

打印:

fund                        fund1                             fund2                          fund3                    
source                     Budget    Expense Total fund1     Budget   Expense Total fund2   Budget Expense Total fund3
type                                                                                                                  
Fonctionnement         1118691.65 -817465.91   301225.74  186718.78 -23621.83   163096.95  15000.0 -6000.0      9000.0
Frais de Gestion              NaN        NaN         NaN   53499.00 -53499.00        0.00      NaN     NaN         NaN
Mission                       NaN        NaN         NaN    1686.47  -1686.47        0.00      NaN     NaN         NaN
Ressource Humaine CDD         NaN        NaN         NaN   38676.53 -36470.36     2206.17      NaN     NaN         NaN

0
投票

我还使用了添加总和行的方法。但有一些额外的功能:

  • 每个级别的摘要(多索引数据帧的 3 个级别)
  • 从字典中取消引用列名
col = {'a':'Column A','b':'Column B','c':'Column C', 'v1':'Value 1', 'v2':'Value 2'}

# Create Pivot Table
r1 = pd.pivot_table(df, index=[col['a'],col['b'],col['c']], values=[col['v1'],col['v2']], 
  margins=True, margins_name='All', aggfunc='sum')

# Create 1st-level Sums
r1s2 = r1.drop('All', level=0).groupby([col['a']]).sum()
  .assign(**{col['b']:'subtotal', col['c']:'(A)'})
  .set_index([col['b'],col['c']], append=True)

# Create second-level Sums
r1s = r1.drop('All', level=0).groupby([col['a'],col['b']]).sum()
  .assign(**{col['c']:'subtotal (B)'})
  .set_index(col['c'], append=True)

# Merge three results together
r1 = pd.concat([r1,r1s2,r1s]).sort_index()
© www.soinside.com 2019 - 2024. All rights reserved.