如何通过使用时间索引转发fillna

问题描述 投票:0回答:1

[我正在尝试使用过去1小时内的groupby Id和tradeType通过使用时间索引来计算每笔交易的累计交易类型(例如B,W,S,R,D ..)计数。

我使用groupby(['Id','tradeType'])和rolling('60T')来计算一个小时内每笔交易的每种累积交易类型。

DataFrame

index               Id  tradeType  
timestamp           
2018-07-17 16:59:57 1   D     
2018-07-17 17:30:31 1   W     
2018-07-16 15:18:18 2   B     
2018-07-16 15:20:19 2   S     
2018-07-16 15:21:37 2   B     
2018-07-16 15:21:47 2   S    
2018-07-16 15:24:01 2   B    
2018-07-16 15:24:07 2   S    
2018-07-16 15:24:29 2   B    
2018-07-16 15:24:35 2   S     
2018-07-16 15:24:47 2   B     
2018-07-16 15:24:54 2   S    
2018-07-16 15:29:23 2   R     
2018-07-16 15:39:24 2   R     
2018-07-16 15:48:23 2   R     
2018-07-16 16:23:24 2   D     
2018-07-17 12:02:39 2   D    
2018-07-17 12:03:34 2   W    
2018-07-17 12:22:39 2   B    
2018-07-17 12:23:44 2   S
df['B_count_60T'] = df[df['trade']=='B'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['S_count_60T'] = df[df['trade']=='S'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['D_count_60T'] = df[df['trade']=='D'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['W_count_60T'] = df[df['trade']=='W'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['R_count_60T'] = df[df['trade']=='R'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())

滚动代码后

                    Id  tradeType   B   S   D   W   R   (_count_60T)
timestamp                           
2018-07-17 16:59:57 1   D           nan nan 1   nan nan
2018-07-17 17:30:31 1   W           nan nan nan 1   nan
2018-07-16 15:18:18 2   B           1   nan nan nan nan
2018-07-16 15:20:19 2   S           nan 1   nan nan nan
2018-07-16 15:21:37 2   B           2   nan nan nan nan
2018-07-16 15:21:47 2   S           nan 2   nan nan nan
2018-07-16 15:24:01 2   B           3   nan nan nan nan
2018-07-16 15:24:07 2   S           nan 3   nan nan nan
2018-07-16 15:24:29 2   B           4   nan nan nan nan
2018-07-16 15:24:35 2   S           nan 4   nan nan nan
2018-07-16 15:24:47 2   B           5   nan nan nan nan
2018-07-16 15:24:54 2   S           nan 5   nan nan nan
2018-07-16 15:29:23 2   R           nan nan nan nan 1
2018-07-16 15:39:24 2   R           nan nan nan nan 2
2018-07-16 15:48:23 2   R           nan nan nan nan 3
2018-07-16 16:23:24 2   D           nan nan 1   nan nan
2018-07-17 12:02:39 2   D           nan nan 1   nan nan
2018-07-17 12:03:34 2   W           nan nan nan 1   nan
2018-07-17 12:22:39 2   B           1   nan nan nan nan
2018-07-17 12:23:44 2   S           nan 1   nan nan nan     

然后,我考虑用交易1小时计算所有价值计数,以正确的值填充nan。但这并不容易解决。

是否有使用滚动代码制作此表的好方法?

我想制作一个如下表

                    Id  tradeType   B   S   D   W   R   (_count_60T)
timestamp                           
2018-07-17 16:59:57 1   D           0   0   1   0   0
2018-07-17 17:30:31 1   W           0   0   1   1   0
2018-07-16 15:18:18 2   B           1   0   0   0   0
2018-07-16 15:20:19 2   S           1   1   0   0   0
2018-07-16 15:21:37 2   B           2   1   0   0   0
2018-07-16 15:21:47 2   S           2   2   0   0   0
2018-07-16 15:24:01 2   B           3   2   0   0   0
2018-07-16 15:24:07 2   S           3   3   0   0   0
2018-07-16 15:24:29 2   B           4   3   0   0   0
2018-07-16 15:24:35 2   S           4   4   0   0   0
2018-07-16 15:24:47 2   B           5   4   0   0   0
2018-07-16 15:24:54 2   S           5   5   0   0   0
2018-07-16 15:29:23 2   R           5   5   0   0   1
2018-07-16 15:39:24 2   R           5   5   0   0   2
2018-07-16 15:48:23 2   R           5   5   0   0   3
2018-07-16 16:23:24 2   D           3   3   1   0   3
2018-07-17 12:02:39 2   D           0   0   1   0   0
2018-07-17 12:03:34 2   W           0   0   1   1   0
2018-07-17 12:22:39 2   B           1   0   1   1   0
2018-07-17 12:23:44 2   S           1   1   1   1   0   

另一种尝试在不使用滚动代码的情况下制作此表的方法

虽然有效,但只花了一根柱子就花了很长时间。

df_cnt = pd.DataFrame()

df = df.reset_index()
df['timestamp_before_60T'] = df['timestamp'] - timedelta(hours=1)

for row in df.itertuples():
    col_Id = getattr(row, 'Id')
    col_timestamp = getattr(row, 'timestamp')
    col_timestamp_before_60T = getattr(row, 'timestamp_before_60T')

    tmp = df[(df['memberId']==col_Id)&(df['timestamp']<=col_timestamp)&(df['timestamp']>=col_timestamp_before_60T)]
    tmp_2 = tmp.groupby(['Id']).apply(lambda x: (x['tradeType']=='B').sum())
    df_cnt = pd.concat([df_cnt, tmp_2])        

[我正在尝试使用时间索引在过去1小时内使用groupby Id和tradeType来统计每笔交易的累计交易类型(例如B,W,S,R,D ..)。我用了groupby(['Id','tradeType'])...

python pandas timestamp time-series nan
1个回答
0
投票

IIUC使用get_dummies,然后用groupby ID和rolling

© www.soinside.com 2019 - 2024. All rights reserved.