如何过滤具有不同条件的多索引级别数据帧的一部分?

问题描述 投票:0回答:2

这是原始数据框,

                        country2
year                        2017     2018     2019
country1                       1   2    1   2    3   6
data_provider indicator
prov_1        ind_a           45  30   22  30   30  30
prov_2        ind_a           30  30   30  30   25  30
              ind_b           30  32   30  30   30  30
prov_3        ind_b           30  30   30  35   30  28

我希望过滤该列并最终获得一个新的数据框,

# item                    country2
# year                        2017        2018     2019
# country1                       1     2     1   2    3   6
# data_provider indicator
# prov_1        ind_a         45.0   NaN  22.0 NaN  NaN NaN
# prov_2        ind_a          NaN   NaN   NaN NaN  NaN NaN
#               ind_b          NaN  32.0   NaN NaN  NaN NaN
# prov_3        ind_b          NaN   NaN   NaN NaN  NaN NaN

您可以通过以下方式获取原始数据框,

df = pd.DataFrame(
    data={"data_provider": ["prov_1", "prov_1", "prov_2", "prov_2", "prov_3", "prov_3"],
          "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
          "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
          "year": ["2017", "2018","2019", "2017","2018","2019"],
          "country1": [1, 1, 3, 2, 2, 6],
          "country2": [45, 22, 25, 32, 35, 28]
          }
)

df = df.pivot_table(
    index=['data_provider', 'indicator'],
    columns=['year', 'country1'],
    fill_value=30
  )
df.columns.names = ['item', 'year', 'country1']

这是我获取新数据框的方法,

  1. 找到2组目标列标签
    x1 = df.columns[df.columns.get_level_values(level='年份')=='2017']
    x2 = df.columns[df.columns.get_level_values(level='年份')=='2018']

  2. 使用条件1获取newdf1

    df[x1]>30
    newdf1 = df[df[x1] > 30]

  3. 使用条件2获取newdf2

    df[x2]<30
    newdf2 = df[df[x2] < 30]

  4. 用 newdf1 更新 newdf2

newdf = newdf2.combine_first(newdf1)

在我的解决方案中,我首先在用不同条件过滤原始数据帧后得到2个数据帧,然后将它们组合在一起。我想知道是否有一种直接的方法可以实现这一目标。

python pandas dataframe multi-index
2个回答
2
投票
import pandas as pd

df = pd.DataFrame(
    data={
        "data_provider": ["prov_1", "prov_1", "prov_2", "prov_2", "prov_3", "prov_3"],
        "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
        "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
        "year": ["2017", "2018","2019", "2017","2018","2019"],
        "country1": [1, 1, 3, 2, 2, 6],
        "country2": [45, 22, 25, 32, 35, 28]
    }
)

df = df.pivot_table(
    index=['data_provider', 'indicator'],
    columns=['year', 'country1'],
    fill_value=30
)
df.columns.names = ['item', 'year', 'country1']


cond1 = (df.columns.get_level_values(level='year') == '2017') & df.gt(30)
cond2 = (df.columns.get_level_values(level='year') == '2018') & df.lt(30)
    
df_new = df.where(cond1 | cond2)
print(df_new)

输出:

                     country2                  
year                        2017       2018    2019
country1                       1   2    1   2   3   6
data_provider indicator                           
prov_1        ind_a         45.0 NaN  22.0 NaN NaN NaN
prov_2        ind_a          NaN NaN   NaN NaN NaN NaN
              ind_b          NaN NaN   NaN NaN NaN NaN
prov_3        ind_b          NaN NaN   NaN NaN NaN NaN

0
投票

代码

我稍微修改了@Amira Bedhiafi的代码,因为它似乎没有生成所需的输出。(我用红色圆圈标记了我的输出。)

cond1 = (df.columns.get_level_values(level='year') == '2017') & df.gt(30)
cond2 = (df.columns.get_level_values(level='year') == '2018') & df.lt(30)
df.where(cond1 | cond2)

输出:

© www.soinside.com 2019 - 2024. All rights reserved.