计算重复数据的日期差异

问题描述 投票:0回答:1

我有一个由一系列重复事件组成的数据集。我想计算“ Nan”到“ Complete”(Count中第一个数字到最后一个数字在重复之前的每一行)之间的日期差。

    Id  CreatedDate         NewValue    Count     Diff    
0   ABC 2018-11-28 09:16:15 NaN         1         0
1   ABC 2019-01-17 14:09:02 Approved    2         ?
2   ABC 2019-03-01 13:41:16 Req Def     3         ?
3   ABC 2019-03-26 10:31:00 Dev&Config  4         ?
4   ABC 2019-03-26 10:31:19 Testing     5         ?
5   ABC 2019-04-26 10:03:09 Complete    6         ?
6   EAI 2018-11-28 16:08:55 NaN         1         0
7   EAI 2018-12-03 10:06:42 Approved    2         ?
8   EAI 2019-01-18 17:15:29 Req Def     3         ?
9   EAI 2019-03-21 23:48:08 Testing     4         ?
10  EAI 2019-05-06 16:50:03 Complete    5         ?
11  BAC 2018-11-30 12:11:26 NaN         1         0
12  BAC 2018-12-03 14:22:53 Approved    2         ?
13  BAC 2018-12-19 14:00:03 Req Def     3         ?
14  BAC 2019-09-18 11:52:16 Complete    4         ?

我试图在第一组重复的值上使用以下代码,但出现错误“系列”对象没有属性“ to_series”。关于如何获得此差异函数以对每个ID重复的任何想法?谢谢!

```practice_set['Diff'] = practice_set.CreatedDate.to_series().diff().dt.seconds.div(60, fill_value=0)```
python-3.x pandas pandas-groupby
1个回答
0
投票

认为这可能是您正在寻找的。

您的数据框:

     Id         CreatedDate    NewValue  Count
0   ABC 2018-11-28 09:16:15         NaN      1
1   ABC 2019-01-17 14:09:02    Approved      2
2   ABC 2019-03-01 13:41:16     Req Def      3
3   ABC 2019-03-26 10:31:00  Dev&Config      4
4   ABC 2019-03-26 10:31:19     Testing      5
5   ABC 2019-04-26 10:03:09    Complete      6
6   EAI 2018-11-28 16:08:55         NaN      1
7   EAI 2018-12-03 10:06:42    Approved      2
8   EAI 2019-01-18 17:15:29     Req Def      3
9   EAI 2019-03-21 23:48:08     Testing      4
10  EAI 2019-05-06 16:50:03    Complete      5
11  BAC 2018-11-30 12:11:26         NaN      1
12  BAC 2018-12-03 14:22:53    Approved      2
13  BAC 2018-12-19 14:00:03     Req Def      3
14  BAC 2019-09-18 11:52:16    Complete      4

将'CreatedDate'列转换为熊猫的date_time类型-

df['CreatedDate'] = pd.to_datetime(df['CreatedDate'])

然后,按'ID'分组,并获取每行'CreatedDates'之间的差异-

df['Diff'] = df.groupby('Id')['CreatedDate'].diff()

df ['Diff']现在看起来像,

     Id         CreatedDate        ...        Count              Diff
0   ABC 2018-11-28 09:16:15        ...            1               NaT
1   ABC 2019-01-17 14:09:02        ...            2  50 days 04:52:47
2   ABC 2019-03-01 13:41:16        ...            3  42 days 23:32:14
3   ABC 2019-03-26 10:31:00        ...            4  24 days 20:49:44
4   ABC 2019-03-26 10:31:19        ...            5   0 days 00:00:19
5   ABC 2019-04-26 10:03:09        ...            6  30 days 23:31:50
6   EAI 2018-11-28 16:08:55        ...            1               NaT
7   EAI 2018-12-03 10:06:42        ...            2   4 days 17:57:47
8   EAI 2019-01-18 17:15:29        ...            3  46 days 07:08:47
9   EAI 2019-03-21 23:48:08        ...            4  62 days 06:32:39
10  EAI 2019-05-06 16:50:03        ...            5  45 days 17:01:55
11  BAC 2018-11-30 12:11:26        ...            1               NaT
12  BAC 2018-12-03 14:22:53        ...            2   3 days 02:11:27
13  BAC 2018-12-19 14:00:03        ...            3  15 days 23:37:10
14  BAC 2019-09-18 11:52:16        ...            4 272 days 21:52:13

如果您不想做'NaT'

df = df.fillna(0)
© www.soinside.com 2019 - 2024. All rights reserved.