我有一个由一系列重复事件组成的数据集。我想计算“ Nan”到“ Complete”(Count中第一个数字到最后一个数字在重复之前的每一行)之间的日期差。
Id CreatedDate NewValue Count Diff
0 ABC 2018-11-28 09:16:15 NaN 1 0
1 ABC 2019-01-17 14:09:02 Approved 2 ?
2 ABC 2019-03-01 13:41:16 Req Def 3 ?
3 ABC 2019-03-26 10:31:00 Dev&Config 4 ?
4 ABC 2019-03-26 10:31:19 Testing 5 ?
5 ABC 2019-04-26 10:03:09 Complete 6 ?
6 EAI 2018-11-28 16:08:55 NaN 1 0
7 EAI 2018-12-03 10:06:42 Approved 2 ?
8 EAI 2019-01-18 17:15:29 Req Def 3 ?
9 EAI 2019-03-21 23:48:08 Testing 4 ?
10 EAI 2019-05-06 16:50:03 Complete 5 ?
11 BAC 2018-11-30 12:11:26 NaN 1 0
12 BAC 2018-12-03 14:22:53 Approved 2 ?
13 BAC 2018-12-19 14:00:03 Req Def 3 ?
14 BAC 2019-09-18 11:52:16 Complete 4 ?
我试图在第一组重复的值上使用以下代码,但出现错误“系列”对象没有属性“ to_series”。关于如何获得此差异函数以对每个ID重复的任何想法?谢谢!
```practice_set['Diff'] = practice_set.CreatedDate.to_series().diff().dt.seconds.div(60, fill_value=0)```
认为这可能是您正在寻找的。
您的数据框:
Id CreatedDate NewValue Count
0 ABC 2018-11-28 09:16:15 NaN 1
1 ABC 2019-01-17 14:09:02 Approved 2
2 ABC 2019-03-01 13:41:16 Req Def 3
3 ABC 2019-03-26 10:31:00 Dev&Config 4
4 ABC 2019-03-26 10:31:19 Testing 5
5 ABC 2019-04-26 10:03:09 Complete 6
6 EAI 2018-11-28 16:08:55 NaN 1
7 EAI 2018-12-03 10:06:42 Approved 2
8 EAI 2019-01-18 17:15:29 Req Def 3
9 EAI 2019-03-21 23:48:08 Testing 4
10 EAI 2019-05-06 16:50:03 Complete 5
11 BAC 2018-11-30 12:11:26 NaN 1
12 BAC 2018-12-03 14:22:53 Approved 2
13 BAC 2018-12-19 14:00:03 Req Def 3
14 BAC 2019-09-18 11:52:16 Complete 4
将'CreatedDate'列转换为熊猫的date_time类型-
df['CreatedDate'] = pd.to_datetime(df['CreatedDate'])
然后,按'ID'分组,并获取每行'CreatedDates'之间的差异-
df['Diff'] = df.groupby('Id')['CreatedDate'].diff()
df ['Diff']现在看起来像,
Id CreatedDate ... Count Diff
0 ABC 2018-11-28 09:16:15 ... 1 NaT
1 ABC 2019-01-17 14:09:02 ... 2 50 days 04:52:47
2 ABC 2019-03-01 13:41:16 ... 3 42 days 23:32:14
3 ABC 2019-03-26 10:31:00 ... 4 24 days 20:49:44
4 ABC 2019-03-26 10:31:19 ... 5 0 days 00:00:19
5 ABC 2019-04-26 10:03:09 ... 6 30 days 23:31:50
6 EAI 2018-11-28 16:08:55 ... 1 NaT
7 EAI 2018-12-03 10:06:42 ... 2 4 days 17:57:47
8 EAI 2019-01-18 17:15:29 ... 3 46 days 07:08:47
9 EAI 2019-03-21 23:48:08 ... 4 62 days 06:32:39
10 EAI 2019-05-06 16:50:03 ... 5 45 days 17:01:55
11 BAC 2018-11-30 12:11:26 ... 1 NaT
12 BAC 2018-12-03 14:22:53 ... 2 3 days 02:11:27
13 BAC 2018-12-19 14:00:03 ... 3 15 days 23:37:10
14 BAC 2019-09-18 11:52:16 ... 4 272 days 21:52:13
如果您不想做'NaT'
df = df.fillna(0)