Pandas - Times系列数据帧组的多个切片

问题描述 投票:0回答:1

是)我有的:

数据框df由3列(Id, Item and Timestamp)组成。每个主题都有独特的Id,在特定的日期和时间记录ItemTimestamp)。第二个数据框,df_ref包括日期时间范围参考切片dfStartEnd为每个主题,Id

df

         Id      Item      Timestamp
   0     1       aaa       2011-03-15 14:21:00
   1     1       raa       2012-05-03 04:34:01
   2     1       baa       2013-05-08 22:21:29
   3     1       boo       2015-12-24 21:53:41
   4     1       afr       2016-04-14 12:28:26
   5     1       aud       2017-05-10 11:58:02
   6     2       boo       2004-06-22 22:20:58
   7     2       aaa       2005-11-16 07:00:00
   8     2       ige       2006-06-28 17:09:18
   9     2       baa       2008-05-22 21:28:00
   10    2       boo       2017-06-08 23:31:06
   11    3       ige       2011-06-30 13:14:21
   12    3       afr       2013-06-11 01:38:48
   13    3       gui       2013-06-21 23:14:26
   14    3       loo       2014-06-10 15:15:42
   15    3       boo       2015-01-23 02:08:35
   16    3       afr       2015-04-15 00:15:23
   17    3       aaa       2016-02-16 10:26:03
   18    3       aaa       2016-06-10 01:11:15
   19    3       ige       2016-07-18 11:41:18
   20    3       boo       2016-12-06 19:14:00
   21    4       gui       2016-01-05 09:19:50
   22    4       aaa       2016-12-09 14:49:50  
   23    4       ige       2016-11-01 08:23:18    

df_ref

         Id     Start                   End
    0    1      2013-03-12 00:00:00     2016-05-30 15:20:36
    1    2      2005-06-05 08:51:22     2007-02-24 00:00:00
    2    3      2011-05-14 10:11:28     2013-12-31 17:04:55
    3    3      2015-03-29 12:18:31     2016-07-26 00:00:00

我想要的是:

根据df中每个Id(groupby Id)给出的数据时间范围切片df_ref数据帧,并将切片数据连接到新的数据帧。但是,主题可以具有多个日期时间范围(在此示例中,Id = 3具有2个日期时间范围)。

df_expected

         Id      Item      Timestamp
     0   1       baa       2013-05-08 22:21:29
     1   1       boo       2015-12-24 21:53:41
     2   1       afr       2016-04-14 12:28:26
     3   2       aaa       2005-11-16 07:00:00
     4   2       ige       2006-06-28 17:09:18
     5   3       ige       2011-06-30 13:14:21
     6   3       afr       2013-06-11 01:38:48
     7   3       gui       2013-06-21 23:14:26
     8   3       afr       2015-04-15 00:15:23
     9   3       aaa       2016-02-16 10:26:03
     10  3       aaa       2016-06-10 01:11:15
     11  3       ige       2016-07-18 11:41:18

到目前为止我做了什么:

我在做代码时提到了这篇文章(Time series multiple slice)。我修改代码,因为它没有我需要的groupby元素。

我的代码:

from datetime import datetime

df['Timestamp'] = pd.to_datetime(df.Timestamp, format='%Y-%m-%d %H:%M')

x = pd.DataFrame()
for pid in def_ref.Id.unique():
    selection = df[(df['Id']== pid) & (df['Timestamp']>= def_ref['Start']) & (df['Timestamp']<= def_ref['End'])]
    x = x.append(selection)

上面的代码给出错误:

ValueError: Can only compare identically-labeled Series objects
python pandas dataframe group-by slice
1个回答
0
投票

首先使用merge和默认的内部连接,它也为重复的Id创建所有组合。然后按betweenDataFrame.loc过滤条件和df1.columns一步过滤:

df1 = df.merge(df_ref, on='Id')
df2 = df1.loc[df1['Timestamp'].between(df1['Start'], df1['End']), df.columns]
print (df2)
    Id Item           Timestamp
2    1  baa 2013-05-08 22:21:29
3    1  boo 2015-12-24 21:53:41
4    1  afr 2016-04-14 12:28:26
7    2  aaa 2005-11-16 07:00:00
8    2  ige 2006-06-28 17:09:18
11   3  ige 2011-06-30 13:14:21
13   3  afr 2013-06-11 01:38:48
15   3  gui 2013-06-21 23:14:26
22   3  afr 2015-04-15 00:15:23
24   3  aaa 2016-02-16 10:26:03
26   3  aaa 2016-06-10 01:11:15
28   3  ige 2016-07-18 11:41:18
© www.soinside.com 2019 - 2024. All rights reserved.