多索引数据透视表 Pandas

Question

我正在尝试使用 pandas 使用下面来自一个更大的表的示例数据集来复制数据透视表：

year  type   status   paid  balance  count
2000  bank1  active   15      21      1
2001  bank2  default  27      40      1
2002  bank3  payoff   35     150      1
2003  bank4  closed   20      80      1
.       .       .      .       .      .
.       .       .      .       .      .
.       .       .      .       .      .

下面的数据透视表给了我这个输出：

pivot = (pd.pivot_table(df,
                        index=['status', 'type'],
                        values=['paid', 'balance', 'count'],
                        aggfunc="sum")
           .reset_index()
           .rename_axis(None, axis=1))

哪个输出：


status    type    paid    balance   count

active   bank1     500      850         6
         bank2     450      800         8
         bank3     225      940        11
         bank4     580      990        15

payoff   bank1     xxx     xxx        xxx
         bank2     xxx     xxx        xxx
         bank3     xxx     xxx        xxx
         bank4     xxx     xxx        xxx
.
.
.
closed   bank1     xxx     xxx        xxx
         bank2     xxx     xxx        xxx
         bank3     xxx     xxx        xxx
         bank4     xxx     xxx        xxx

我想要这个输出（与 Excel 数据透视表相同）：

 type      paid    balance   count
 active    1755     3580       40  (running totals for active)
 bank1      500      850        6
 bank2      450      800        8
 bank3      225      940       11
 bank4      580      990       15

 payoff    xxx     xxx        xxx  (running totals)
 bank1     xxx     xxx        xxx
 bank2     xxx     xxx        xxx
 bank3     xxx     xxx        xxx
 bank4     xxx     xxx        xxx
.
.
.
 closed    xxx     xxx        xxx (running totals)
 bank1     xxx     xxx        xxx
 bank2     xxx     xxx        xxx
 bank3     xxx     xxx        xxx
 bank4     xxx     xxx        xxx

Answer 1

为此，您首先需要像以前一样创建数据透视表，然后创建一个用于计算小计的函数：

import pandas as pd

data = {
    'year': [2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003],
    'type': ['bank1', 'bank2', 'bank3', 'bank4', 'bank1', 'bank2', 'bank3', 'bank4'],
    'status': ['active', 'active', 'active', 'active', 'payoff', 'payoff', 'payoff', 'payoff'],
    'paid': [15, 27, 35, 20, 10, 20, 30, 25],
    'balance': [21, 40, 150, 80, 10, 40, 60, 70],
    'count': [1, 1, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)

pivot = pd.pivot_table(df, index=['status', 'type'], values=['paid', 'balance', 'count'], aggfunc='sum')

def add_totals(pivot):
    totals = []
    for status, group in pivot.groupby(level=0):
        total_row = group.sum()
        total_row.index = [('total',) + (col,) for col in total_row.index]  
        totals.append(pd.DataFrame(total_row).T.set_index(pd.MultiIndex.from_tuples([(status, 'total')])))
    
    return pd.concat(totals + [pivot])

pivot_with_totals = add_totals(pivot)

print(pivot_with_totals)

这给了你

              (total, balance)  (total, count)  (total, paid)  balance  count  \
active total             291.0             4.0           97.0      NaN    NaN   
payoff total             180.0             4.0           85.0      NaN    NaN   
active bank1               NaN             NaN            NaN     21.0    1.0   
       bank2               NaN             NaN            NaN     40.0    1.0   
       bank3               NaN             NaN            NaN    150.0    1.0   
       bank4               NaN             NaN            NaN     80.0    1.0   
payoff bank1               NaN             NaN            NaN     10.0    1.0   
       bank2               NaN             NaN            NaN     40.0    1.0   
       bank3               NaN             NaN            NaN     60.0    1.0   
       bank4               NaN             NaN            NaN     70.0    1.0   

              paid  
active total   NaN  
payoff total   NaN  
active bank1  15.0  
       bank2  27.0  
       bank3  35.0  
       bank4  20.0  
payoff bank1  10.0  
       bank2  20.0  
       bank3  30.0  
       bank4  25.0

多索引数据透视表 Pandas

问题描述投票：0回答：1

1个回答

最新问题

多索引数据透视表 Pandas

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1