在多索引数据框中根据原始列名创建新的计算列,同时保留二级格式

问题描述 投票:0回答:1

我有一个包含许多行和列的大型多索引数据框。我想添加新列,其名称基于原始一级列名称。我想为每个新创建的列保留多索引的二级格式。新列计算列之间的变化和百分比变化。理想情况下,我希望自动完成此操作,这样我就不必手动创建新列和列名称。

原文:

import numpy as np
import pandas as pd


data = [[99,3,12,4,63,55]]

cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])

df = pd.DataFrame(data, columns = cols)

print(df)


所需输出:

data_new = [[99,3,-36,52,-36,1733,12,4,51,51,425,1275,63,55]]

cols_new = pd.MultiIndex.from_product([['1. FY21','FY23-FY21','FY23-FY21_ % Change','2. FY22','FY23-FY22','FY23-FY22_ % Change','3. FY23'],['Values','Sites']])

df_new = pd.DataFrame(data_new, columns = cols_new)

print(df_new)

python pandas dataframe multi-index calculated-columns
1个回答
0
投票

尝试:

cols = df.columns.get_level_values(0).unique()
last = df.xs(cols[-1], level=0, axis=1)

all_dfs = []
for c in cols[:-1]:
    o = df.xs(c, level=0, axis=1)

    d = last - o
    d.columns = pd.MultiIndex.from_product([[f"{cols[-1]}-{c.split()[-1]}"], d.columns])

    ch = (last / o - 1) * 100
    ch.columns = pd.MultiIndex.from_product(
        [[f"{cols[-1]}-{c.split()[-1]}_% Change"], ch.columns]
    )

    o.columns = pd.MultiIndex.from_product([[c], o.columns])

    all_dfs.extend([o, d, ch])

last.columns = pd.MultiIndex.from_product([[cols[-1]], last.columns])
all_dfs.append(last)

out = pd.concat(all_dfs, axis=1)
print(out)

打印:

  1. FY21       3. FY23-FY21       3. FY23-FY21_% Change              2. FY22       3. FY23-FY22       3. FY23-FY22_% Change         3. FY23      
   Values Sites       Values Sites                Values        Sites  Values Sites       Values Sites                Values   Sites  Values Sites
0      99     3          -36    52            -36.363636  1733.333333      12     4           51    51                 425.0  1275.0      63    55
© www.soinside.com 2019 - 2024. All rights reserved.