我有一个包含许多行和列的大型多索引数据框。我想添加新列,其名称基于原始一级列名称。我想为每个新创建的列保留多索引的二级格式。新列计算列之间的变化和百分比变化。理想情况下,我希望自动完成此操作,这样我就不必手动创建新列和列名称。
原文:
import numpy as np
import pandas as pd
data = [[99,3,12,4,63,55]]
cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])
df = pd.DataFrame(data, columns = cols)
print(df)
所需输出:
data_new = [[99,3,-36,52,-36,1733,12,4,51,51,425,1275,63,55]]
cols_new = pd.MultiIndex.from_product([['1. FY21','FY23-FY21','FY23-FY21_ % Change','2. FY22','FY23-FY22','FY23-FY22_ % Change','3. FY23'],['Values','Sites']])
df_new = pd.DataFrame(data_new, columns = cols_new)
print(df_new)
尝试:
cols = df.columns.get_level_values(0).unique()
last = df.xs(cols[-1], level=0, axis=1)
all_dfs = []
for c in cols[:-1]:
o = df.xs(c, level=0, axis=1)
d = last - o
d.columns = pd.MultiIndex.from_product([[f"{cols[-1]}-{c.split()[-1]}"], d.columns])
ch = (last / o - 1) * 100
ch.columns = pd.MultiIndex.from_product(
[[f"{cols[-1]}-{c.split()[-1]}_% Change"], ch.columns]
)
o.columns = pd.MultiIndex.from_product([[c], o.columns])
all_dfs.extend([o, d, ch])
last.columns = pd.MultiIndex.from_product([[cols[-1]], last.columns])
all_dfs.append(last)
out = pd.concat(all_dfs, axis=1)
print(out)
打印:
1. FY21 3. FY23-FY21 3. FY23-FY21_% Change 2. FY22 3. FY23-FY22 3. FY23-FY22_% Change 3. FY23
Values Sites Values Sites Values Sites Values Sites Values Sites Values Sites Values Sites
0 99 3 -36 52 -36.363636 1733.333333 12 4 51 51 425.0 1275.0 63 55