Pandas:基于具有条件的groupby添加列

问题描述 投票:0回答:1

我有一个包含四列的数据框:id1,id2,age,stime。例如

df = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16')], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20')], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08')],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19')], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17')],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53')], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57')],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38')], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38')],
                       ]),
                       columns=['id1', 'id2', 'age', 'stime'])

我想添加一列,该列的值是age的最大值,该列也具有匹配的id2,并且在该行的stime的最后两周内。因此,对于以上示例,我想获得

df2 = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16'), 3], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20'), 60], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08'), 60],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19'), 3], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17'), 3],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53'), 3], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57'), 40],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38'), 40], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38'), 60]
                       ]),
                       columns=['id1', 'id2', 'age', 'stime', 'max_age_last_2w'])

由于我要执行的df很大,因此非常感谢您对如何有效执行此操作提供任何帮助-预先感谢!

python pandas
1个回答
0
投票

尝试:

df['max_age_last_2w'] = df.groupby(['id2', pd.Grouper(key='stime', freq='2W')])['age'].transform('max')

输出:

  id1 id2 age               stime  max_age_last_2w
0   1   1   3 2020-01-10 00:30:16                3
1   2   1  10 2020-01-27 00:20:20               60
2   3   1  60 2020-01-26 00:10:08               60
3   4   2   1 2020-01-13 00:20:19                3
4   5   2   2 2020-01-12 00:40:17                3
5   6   2   3 2020-01-10 00:10:53                3
6   7   3  20 2020-01-21 00:20:57               40
7   8   3  40 2020-01-20 00:10:38               40
8   9   3  60 2020-01-01 00:30:38               60

注意:它实际上并没有为每行查找最近2周。它从开始分组2周开始。但这仍然可以为您提供帮助

© www.soinside.com 2019 - 2024. All rights reserved.