我有以下数据框架。
entry_time_flat route_id time_slot duration n_of_trips
2019-09-02 00:00:00 1_2 0-6 10 29
2019-09-04 00:00:00 3_4 6-12 15 10
2019-09-06 00:00:00 1_2 0-6 20 30
2019-09-06 00:00:00 1_2 18-20 43 30
...
我想计算 "持续时间 "的平均值--创建一个新的功能--在过去的n天中(n_days = 30),条件如下。
if "n_of_trips" >= 30:
mean of "duration", over the last 30 days and all the past transactions, grouping by "route_id" & "time_slot"
else:
mean of "duration", over the last 30 days and all the past transactions, grouping by "route_id" only
遗憾的是,将数据帧分割成两块(>=和<30 n_of_trips)将无法得到一个可接受的结果,因为在计算平均值时,必须包含所有的事务。
如何在计算过去n天的滚动平均值时实现if语句?
我不完全确定是否理解了你这里的目标,但我会尝试。
import pandas as pd
data = {'entry_time_flat': ['2019-09-02 00:00:00', '2019-09-04 00:00:00', '2019-09-06 00:00:00', '2019-09-06 00:00:00'], 'route_id': ['1_2', '3_4', '1_2', '1_2'], 'time_slot': ['0-6', '6-12', '0-6', '18-20'], 'duration': [10, 15, 20, 43], 'n_of_trips': [29, 10, 30, 30]}
df = pd.DataFrame(data=data)
df.entry_time_flat = pd.to_datetime(df.entry_time_flat)
df.set_index('entry_time_flat', inplace=True)
df['duration_rolling'] = df.duration.rolling('30d', min_periods=1).mean()
print(df)
print(df[df.n_of_trips >= 30].groupby(['route_id']).mean())
print(df[df.n_of_trips >= 30].groupby(['time_slot']).mean())
print(df[df.n_of_trips < 30].groupby(['route_id']).mean())
Output:
route_id time_slot duration n_of_trips duration_rolling
entry_time_flat
2019-09-02 1_2 0-6 10 29 10.0
2019-09-04 3_4 6-12 15 10 12.5
2019-09-06 1_2 0-6 20 30 15.0
2019-09-06 1_2 18-20 43 30 22.0
duration n_of_trips duration_rolling
route_id
1_2 31.5 30.0 18.5
duration n_of_trips duration_rolling
time_slot
0-6 20 30 15.0
18-20 43 30 22.0
duration n_of_trips duration_rolling
route_id
1_2 10 29 10.0
3_4 15 10 12.5
在输出中,你当然可以删除 duration
.
这是你要找的东西吗?