1.我有一个 DF,其中日期应始终大于 cop_date 和 fat_date。
2.如果fact是nan,它仍然应该获取大于cop_date的日期
3.只考虑最近的较小日期。
实际DF:
id ins_id date cop cop_date fat fat_date
1234 abc 17/04/2023 1 18/05/2023 nan nan
1234 abc 16/04/2023 1 12/03/2023 nan nan
1234 abc 18/04/2023 1 23/03/2023 nan nan
1234 ghi 22/06/2023 1 27/08/2023 2 15/09/2023
1234 ghi 23/06/2023 1 22/05/2023 2 20/10/2023
1234 ghi 26/06/2023 1 19/04/2023 2 30/04/2023
1234 jkl 22/08/2023 1 26/08/2023 nan nan
1234 jkl 17/08/0223 1 13/08/2023 nan nan
1234 mno 06/05/2023 1 09/05/2023 2 10/05/2023
1234 mno 04/05/2023 1 01/05/2023 2 01/04/2023
预期DF:
id ins_id date cop cop_date fat fat_date
1234 abc 17/04/2023 1 12/03/2023 nan nan
1234 abc 16/04/2023 1 12/03/2023 nan nan
1234 abc 18/04/2023 1 12/03/2023 nan nan
1234 ghi 22/06/2023 1 19/04/2023 2 30/04/2023
1234 ghi 23/06/2023 1 19/04/2023 2 30/04/2023
1234 ghi 26/06/2023 1 19/04/2023 2 30/04/2023
1234 jkl 22/08/2023 1 13/08/2023 nan nan
1234 jkl 17/08/0223 1 13/08/2023 nan nan
1234 mno 06/05/2023 1 01/05/2023 2 01/04/2023
1234 mno 04/05/2023 1 01/05/2023 2 01/04/2023
我尝试过的查询:
Data =DF[(DF[`date`] > DF[`cop_date`]) & (DF[`date`] > DF[`fat_date`])]
Data= data.drop_duplicates(subset=['id','ins_id','date'],keep='first')
上面的查询删除了具有 nan 值的行。
用途:
cols = ['id','ins_id']
df['cop_date'] = df.assign(d = df['cop_date'].where(df['cop_date'].lt(df['date']))).groupby(cols)['d'].transform('min')
df['fat_date'] = df.assign(d = df['fat_date'].where(df['fat_date'].lt(df['date']))).groupby(cols)['d'].transform('min')
print (df)
id ins_id date cop cop_date fat fat_date
0 1234 abc 2023-04-17 1 2023-03-12 NaN NaT
1 1234 abc 2023-04-16 1 2023-03-12 NaN NaT
2 1234 abc 2023-04-18 1 2023-03-12 NaN NaT
3 1234 ghi 2023-06-22 1 2023-04-19 2.0 2023-04-30
4 1234 ghi 2023-06-23 1 2023-04-19 2.0 2023-04-30
5 1234 ghi 2023-06-26 1 2023-04-19 2.0 2023-04-30
6 1234 jkl 2023-08-22 1 2023-08-13 NaN NaT
7 1234 jkl 2023-08-17 1 2023-08-13 NaN NaT
8 1234 mno 2023-05-06 1 2023-05-01 2.0 2023-04-01
9 1234 mno 2023-05-04 1 2023-05-01 2.0 2023-04-01