挤压多索引数据框中包含缺失值的行

问题描述 投票:0回答:1

考虑以下多索引

pd.DataFrame
,它有许多缺失值。

import numpy as np
import pandas as pd

# Create multi-index
index = pd.MultiIndex.from_tuples(
    [
        ("A", "X", "I"),
        ("A", "X", "I"),
        ("A", "Y", "I"),
        ("A", "Y", "II"),
        ("A", "Y", "I"),
    ],
    names=["level_1", "level_2", "level_3"],
)

# Create dataframe
data = [[1, np.nan], [np.nan, 1], [np.nan, 1], [np.nan, 1], [1, np.nan]]
df = pd.DataFrame(data, index=index, columns=["column1", "column2"])

print(df)

                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      NaN
                I            NaN      1.0
        Y       I            NaN      1.0
                II           NaN      1.0
                I            1.0      NaN

怎样才能尽可能地挤压行数?我正在寻找以下结果:

                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      1.0
        Y       I            1.0      1.0
                II           NaN      1.0
python pandas multi-index
1个回答
1
投票

如果可能的话,每个索引的聚合值,例如

mean

df = df.groupby(level=[0,1,2]).mean()
print(df)
                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      1.0
        Y       I            1.0      1.0
                II           NaN      1.0

如果想避免聚集:

f = lambda x: x.apply(lambda x: x.sort_values(key=lambda z: z.isna()))

df = df.groupby(level=[0,1,2], group_keys=False).apply(f).dropna(how='all')
print(df)
                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      1.0
        Y       I            1.0      1.0
                II           NaN      1.0
© www.soinside.com 2019 - 2024. All rights reserved.