[我正在尝试使用过去1小时内的groupby Id和tradeType通过使用时间索引来计算每笔交易的累计交易类型(例如B,W,S,R,D ..)计数。
我使用groupby(['Id','tradeType'])和rolling('60T')来计算一个小时内每笔交易的每种累积交易类型。
index Id tradeType
timestamp
2018-07-17 16:59:57 1 D
2018-07-17 17:30:31 1 W
2018-07-16 15:18:18 2 B
2018-07-16 15:20:19 2 S
2018-07-16 15:21:37 2 B
2018-07-16 15:21:47 2 S
2018-07-16 15:24:01 2 B
2018-07-16 15:24:07 2 S
2018-07-16 15:24:29 2 B
2018-07-16 15:24:35 2 S
2018-07-16 15:24:47 2 B
2018-07-16 15:24:54 2 S
2018-07-16 15:29:23 2 R
2018-07-16 15:39:24 2 R
2018-07-16 15:48:23 2 R
2018-07-16 16:23:24 2 D
2018-07-17 12:02:39 2 D
2018-07-17 12:03:34 2 W
2018-07-17 12:22:39 2 B
2018-07-17 12:23:44 2 S
df['B_count_60T'] = df[df['trade']=='B'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['S_count_60T'] = df[df['trade']=='S'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['D_count_60T'] = df[df['trade']=='D'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['W_count_60T'] = df[df['trade']=='W'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
df['R_count_60T'] = df[df['trade']=='R'].groupby(['Id','tradeType'])['memberId'].transform(lambda x: x.rolling('60T').count())
Id tradeType B S D W R (_count_60T)
timestamp
2018-07-17 16:59:57 1 D nan nan 1 nan nan
2018-07-17 17:30:31 1 W nan nan nan 1 nan
2018-07-16 15:18:18 2 B 1 nan nan nan nan
2018-07-16 15:20:19 2 S nan 1 nan nan nan
2018-07-16 15:21:37 2 B 2 nan nan nan nan
2018-07-16 15:21:47 2 S nan 2 nan nan nan
2018-07-16 15:24:01 2 B 3 nan nan nan nan
2018-07-16 15:24:07 2 S nan 3 nan nan nan
2018-07-16 15:24:29 2 B 4 nan nan nan nan
2018-07-16 15:24:35 2 S nan 4 nan nan nan
2018-07-16 15:24:47 2 B 5 nan nan nan nan
2018-07-16 15:24:54 2 S nan 5 nan nan nan
2018-07-16 15:29:23 2 R nan nan nan nan 1
2018-07-16 15:39:24 2 R nan nan nan nan 2
2018-07-16 15:48:23 2 R nan nan nan nan 3
2018-07-16 16:23:24 2 D nan nan 1 nan nan
2018-07-17 12:02:39 2 D nan nan 1 nan nan
2018-07-17 12:03:34 2 W nan nan nan 1 nan
2018-07-17 12:22:39 2 B 1 nan nan nan nan
2018-07-17 12:23:44 2 S nan 1 nan nan nan
然后,我考虑用交易1小时计算所有价值计数,以正确的值填充nan。但这并不容易解决。
是否有使用滚动代码制作此表的好方法?
Id tradeType B S D W R (_count_60T)
timestamp
2018-07-17 16:59:57 1 D 0 0 1 0 0
2018-07-17 17:30:31 1 W 0 0 1 1 0
2018-07-16 15:18:18 2 B 1 0 0 0 0
2018-07-16 15:20:19 2 S 1 1 0 0 0
2018-07-16 15:21:37 2 B 2 1 0 0 0
2018-07-16 15:21:47 2 S 2 2 0 0 0
2018-07-16 15:24:01 2 B 3 2 0 0 0
2018-07-16 15:24:07 2 S 3 3 0 0 0
2018-07-16 15:24:29 2 B 4 3 0 0 0
2018-07-16 15:24:35 2 S 4 4 0 0 0
2018-07-16 15:24:47 2 B 5 4 0 0 0
2018-07-16 15:24:54 2 S 5 5 0 0 0
2018-07-16 15:29:23 2 R 5 5 0 0 1
2018-07-16 15:39:24 2 R 5 5 0 0 2
2018-07-16 15:48:23 2 R 5 5 0 0 3
2018-07-16 16:23:24 2 D 3 3 1 0 3
2018-07-17 12:02:39 2 D 0 0 1 0 0
2018-07-17 12:03:34 2 W 0 0 1 1 0
2018-07-17 12:22:39 2 B 1 0 1 1 0
2018-07-17 12:23:44 2 S 1 1 1 1 0
虽然有效,但只花了一根柱子就花了很长时间。
df_cnt = pd.DataFrame()
df = df.reset_index()
df['timestamp_before_60T'] = df['timestamp'] - timedelta(hours=1)
for row in df.itertuples():
col_Id = getattr(row, 'Id')
col_timestamp = getattr(row, 'timestamp')
col_timestamp_before_60T = getattr(row, 'timestamp_before_60T')
tmp = df[(df['memberId']==col_Id)&(df['timestamp']<=col_timestamp)&(df['timestamp']>=col_timestamp_before_60T)]
tmp_2 = tmp.groupby(['Id']).apply(lambda x: (x['tradeType']=='B').sum())
df_cnt = pd.concat([df_cnt, tmp_2])
[我正在尝试使用时间索引在过去1小时内使用groupby Id和tradeType来统计每笔交易的累计交易类型(例如B,W,S,R,D ..)。我用了groupby(['Id','tradeType'])...
IIUC使用get_dummies
,然后用groupby
ID和rolling
: