使用 groupby 和删除重复项基于多个条件过滤数据帧的功能

问题描述 投票:0回答:1

我有一个数据框,想创建一个函数来根据某些条件保留行或删除重复项

原始数据框

year  year_month   manager_movement    email_address
2022  2022_jun     transfer_in         [email protected]
2022  2022_jun     no_change           [email protected]
2022  2022_jul     no_change           [email protected]
2022  2022_jul     no_change           [email protected]
2022  2022_aug     no_change           [email protected]
2022  2022_aug     no_change           [email protected]
2022  2022_sep     transfer_out        [email protected]
2022  2022_sep     no_change           [email protected]
2022  2022_oct     transfer_in         [email protected]
2022  2022_oct     no_change           [email protected]
2023  2023_jan     no_change           [email protected]
2023  2023_feb     no_change           [email protected]

预期数据框

year  year_month   manager_movement    email_address
2022  2022_jun     transfer_in         [email protected]
2022  2022_oct     transfer_in         [email protected]
2022  2022_oct     no_change           [email protected]
2023  2023_feb     no_change           [email protected]

获取dataframe的逻辑是这样的 第一:如果 df['manager_movement'] == 'transfer_out',则删除行 第二: elseif df['manager_movement'] == 'transfer_in',然后保留所有行 第三: elseif df['manager_movement'] == 'no_change',然后按 'year' 和 'email_address' 分组并删除重复项并保留最后一行

这是我的尝试,但似乎无法获得我想要的输出。感谢任何帮助或评论,谢谢。

def get_required_rows(x):
   if x['manager_movement'] == 'transfer_out':
      return x.loc[x['manager_movement']!='transfer_out']
   elif x['manager_movement'] == 'transfer_in':
      return x
   elif x['manager_movement'] == 'No Change':
      return x.drop_duplicates(['year','email_address'], keep='last')
   end
    
df_filtered = df.apply(get_required_rows, axis=1)
python-3.x dataframe function filtering
1个回答
0
投票

如何单独进行过滤并连接结果:

pd.concat([
    df[df["manager_movement"] == "transfer_in"],
    df[df["manager_movement"] == "no_change"].drop_duplicates(["year", "email_address"], keep='last')
])

输出:

    year year_month manager_movement         email_address
0   2022   2022_jun      transfer_in    [email protected]
8   2022   2022_oct      transfer_in    [email protected]
4   2022   2022_aug        no_change    [email protected]
9   2022   2022_oct        no_change  [email protected]
11  2023   2023_feb        no_change  [email protected]

(顺便说一句,您想要的输出似乎不符合要求,缺少 1 行

[email protected]
no_change

© www.soinside.com 2019 - 2024. All rights reserved.