用if语句滚动过去n天的平均数。

问题描述 投票:0回答:1

我有以下数据框架。

entry_time_flat           route_id      time_slot      duration    n_of_trips 

2019-09-02 00:00:00           1_2            0-6          10           29
2019-09-04 00:00:00           3_4            6-12         15           10
2019-09-06 00:00:00           1_2            0-6          20           30    
2019-09-06 00:00:00           1_2           18-20         43           30
...

我想计算 "持续时间 "的平均值--创建一个新的功能--在过去的n天中(n_days = 30),条件如下。

if "n_of_trips" >= 30:
    mean of "duration", over the last 30 days and all the past transactions, grouping by  "route_id" & "time_slot" 
else:
    mean of "duration", over the last 30 days and all the past transactions, grouping by "route_id" only

遗憾的是,将数据帧分割成两块(>=和<30 n_of_trips)将无法得到一个可接受的结果,因为在计算平均值时,必须包含所有的事务。

如何在计算过去n天的滚动平均值时实现if语句?

python pandas pandas-groupby rolling-computation
1个回答
0
投票

我不完全确定是否理解了你这里的目标,但我会尝试。

import pandas as pd

data = {'entry_time_flat': ['2019-09-02 00:00:00', '2019-09-04 00:00:00', '2019-09-06 00:00:00', '2019-09-06 00:00:00'], 'route_id': ['1_2', '3_4', '1_2', '1_2'], 'time_slot': ['0-6', '6-12', '0-6', '18-20'], 'duration': [10, 15, 20, 43], 'n_of_trips': [29, 10, 30, 30]}
df = pd.DataFrame(data=data)
df.entry_time_flat = pd.to_datetime(df.entry_time_flat)
df.set_index('entry_time_flat', inplace=True)
df['duration_rolling'] = df.duration.rolling('30d', min_periods=1).mean()
print(df)
print(df[df.n_of_trips >= 30].groupby(['route_id']).mean())
print(df[df.n_of_trips >= 30].groupby(['time_slot']).mean())
print(df[df.n_of_trips < 30].groupby(['route_id']).mean())

Output:
                route_id time_slot  duration  n_of_trips  duration_rolling
entry_time_flat                                                           
2019-09-02           1_2       0-6        10          29              10.0
2019-09-04           3_4      6-12        15          10              12.5
2019-09-06           1_2       0-6        20          30              15.0
2019-09-06           1_2     18-20        43          30              22.0
          duration  n_of_trips  duration_rolling
route_id                                        
1_2           31.5        30.0              18.5
           duration  n_of_trips  duration_rolling
time_slot                                        
0-6              20          30              15.0
18-20            43          30              22.0
          duration  n_of_trips  duration_rolling
route_id                                        
1_2             10          29              10.0
3_4             15          10              12.5

在输出中,你当然可以删除 duration.

这是你要找的东西吗?

© www.soinside.com 2019 - 2024. All rights reserved.