我正在尝试设置一个指示器,用于指示新申请何时导致旧申请被拒绝。
如果personal_id中的任何rejected_time发生在creation_timestamp之后5分钟内,则由于新的申请而被拒绝。基于此,我应该创建“new_application_causes_rejection”列,如示例中所示。
个人ID有数十万个,大多数都有多个应用程序ID,并且应用程序ID内的行数各不相同。
个人ID | application_id | 创建_时间戳 | 批准的金额 | 被拒绝_时间 | 新申请原因_拒绝 |
---|---|---|---|---|---|
5a | 694f | 2023-01-24 13:01:07.939534 | 8000.0 | 2023-01-24 13:13:15.499000 | 0 |
5a | 694f | 2023-01-24 13:01:07.939534 | 8000.0 | 2023-01-24 14:38:02.359000 | 1 |
5a | 694f | 2023-01-24 13:01:07.939534 | 8000.0 | 2023-01-24 14:37:18.616000 | 1 |
5a | 694f | 2023-01-24 13:01:07.939534 | NaN | 2023-01-24 13:03:59.626000 | 0 |
5a | 43fa | 2023-01-24 14:36:08.287521 | NaN | 2023-01-24 14:37:22.096000 | 0 |
5a | 43fa | 2023-01-24 14:36:08.287521 | 13000.0 | 2023-01-24 14:39:31.750000 | 1 |
5a | 43fa | 2023-01-24 14:36:08.287521 | 13000.0 | 2023-02-02 08:42:26.980106 | 1 |
5a | 43fa | 2023-01-24 14:36:08.287521 | NaN | 2023-01-24 14:37:22.948214 | 0 |
5a | a4b6 | 2023-01-24 14:38:42.625969 | 5000.0 | 2023-02-02 08:42:26.980106 | 0 |
5a | a4b7 | 2023-01-24 14:38:42.625969 | NaN | 2023-01-24 14:38:46.922000 | 0 |
5a | a4b8 | 2023-01-24 14:38:42.625969 | 8000.0 | 2023-02-02 08:42:26.980106 | 0 |
我得到了不同的输出:
df['creation_timestamp'] = pd.to_datetime(df['creation_timestamp'])
df['rejected_time'] = pd.to_datetime(df['rejected_time'])
df['new'] = df['rejected_time'].sub(df['creation_timestamp']).lt(pd.Timedelta('5 Min')).astype(int)
print (df)
personal_id application_id creation_timestamp approved_amount \
0 5a 694f 2023-01-24 13:01:07.939534 8000.0
1 5a 694f 2023-01-24 13:01:07.939534 8000.0
2 5a 694f 2023-01-24 13:01:07.939534 8000.0
3 5a 694f 2023-01-24 13:01:07.939534 NaN
4 5a 43fa 2023-01-24 14:36:08.287521 NaN
5 5a 43fa 2023-01-24 14:36:08.287521 13000.0
6 5a 43fa 2023-01-24 14:36:08.287521 13000.0
7 5a 43fa 2023-01-24 14:36:08.287521 NaN
8 5a a4b6 2023-01-24 14:38:42.625969 5000.0
9 5a a4b7 2023-01-24 14:38:42.625969 NaN
10 5a a4b8 2023-01-24 14:38:42.625969 8000.0
rejected_time new_application_causes_rejection new
0 2023-01-24 13:13:15.499000 0 0
1 2023-01-24 14:38:02.359000 1 0
2 2023-01-24 14:37:18.616000 1 0
3 2023-01-24 13:03:59.626000 0 1
4 2023-01-24 14:37:22.096000 0 1
5 2023-01-24 14:39:31.750000 1 1
6 2023-02-02 08:42:26.980106 1 0
7 2023-01-24 14:37:22.948214 0 1
8 2023-02-02 08:42:26.980106 0 0
9 2023-01-24 14:38:46.922000 0 1
10 2023-02-02 08:42:26.980106 0 0
详情:
print (df['rejected_time'].sub(df['creation_timestamp']))
0 0 days 00:12:07.559466
1 0 days 01:36:54.419466
2 0 days 01:36:10.676466
3 0 days 00:02:51.686466
4 0 days 00:01:13.808479
5 0 days 00:03:23.462479
6 8 days 18:06:18.692585
7 0 days 00:01:14.660693
8 8 days 18:03:44.354137
9 0 days 00:00:04.296031
10 8 days 18:03:44.354137
dtype: timedelta64[ns]