Pandas：向多索引列数据框添加多列

Question

此问题是为了概括为该问题提供的解决方案的尝试：

Pandas: add a column to a multiindex column dataframe

我需要为每个列索引生成一个列。

spencerlyon2提供的解决方案在我们要添加单个列时起作用：

df['bar', 'three'] = [0, 1, 2]

但是我想对每个第一级列索引进行概括。

来源DF：

In [1]: df
Out[2]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

下面的目标DF，要求three列是其各自索引的one和two列的加法。

In [1]: df
Out[2]:
first        bar                           baz                 
second       one       two     three       one       two      three
A      -1.089798  2.053026  0.963228‬  1.440740 -2.317647  -0.876907‬
B       0.488875  0.428836  0.917711 -0.683677  0.345873  -0.337804‬
C      -0.243064 -0.069446 -0.312510  0.478370  0.266761   0.745131‬

Answer 1

我从您的示例输入开始：

first        bar                 baz          
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

要向MultiIndex列的每个级别0添加新列，您可以运行类似的内容：

for c1 in df.columns.get_level_values('first').unique():
    # New column int index
    cInd = int(df.columns.get_loc(c1).stop)
    col = (c1, 'three')      # New column name
    newVal = df[(c1, 'one')] + df[(c1, 'two')]
    df.insert(loc=cInd, column=col, value=newVal)  # Insert the new column

在上面的示例中，新列中的值是连续数字，但是根据您的需要设置它们。

我的代码（在列排序之后）的结果是：

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796

请注意，在所有现有列之后，将新列插入当前名称（在顶层）提供正确的列顺序，而在其他解决方案中，列按字母顺序排序（一个，三个和两个），在这种情况下看起来很奇怪。

Answer 2

您可以将join与具有相同索引的两个数据帧一起使用，以一次创建一堆列。

首先，使用groupby对axis=1求和

ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()

        bar       baz
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

（（PS：如果您有多于两列，则可以这样做

df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()

首先仅对列“一”和“两”进行切片，然后仅对[[then groupby）

然后，使其与您的列索引匹配，即，使其与原始数据框一样成为MultiIndexed数据框
ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])
bar baz three three A 0.963228 1.910958 B 0.917711 0.729774 C -0.312510 -0.432796
最后，df.join
df.join
如果您真的很在乎订购，请使用finaldf = df.join(ndf).sort_index(axis=1)
reindex
finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)

Pandas：向多索引列数据框添加多列

问题描述投票：2回答：2

2个回答

最新问题

Pandas：向多索引列数据框添加多列

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2