pandas:如何以给定频率将记录聚合到滚动时间窗口中?

问题描述 投票:0回答:1

这是我的数据:

times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1),
                      periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})

输出:

                           times  data
0  2024-03-20 10:38:44.100877000     0
1  2024-03-20 10:38:45.100877416     1
2  2024-03-20 10:38:46.100877833     2
3  2024-03-20 10:38:47.100878250     3
4  2024-03-20 10:38:48.100878666     4
..                           ...   ...
56 2024-03-20 10:39:40.100900333    56
57 2024-03-20 10:39:41.100900750    57
58 2024-03-20 10:39:42.100901166    58
59 2024-03-20 10:39:43.100901583    59
60 2024-03-20 10:39:44.100902000    60

如果我想用 2 秒的滚动窗口对其进行分组,我可以这样做:

df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2))
for window in df_windows:
    print(window)

然后我明白了:

times                           
2024-03-20 10:48:09.273265     0
                               data
times                              
2024-03-20 10:48:09.273265000     0
2024-03-20 10:48:10.273265333     1
                               data
times                              
2024-03-20 10:48:10.273265333     1
2024-03-20 10:48:11.273265666     2
                               data
times                              
2024-03-20 10:48:11.273265666     2
2024-03-20 10:48:12.273266000     3
                               data

酷。但如果我不想要一个相对于每一行计算的窗口,那么 pandas 似乎缺乏做到这一点的功能?例如。

rolling
https://github.com/pandas-dev/pandas/issues/15354)添加了一个步骤参数,但它不适用于这种情况:

df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2), step=2)

NotImplementedError: step is not supported with frequency windows

它也没有多大意义,因为

2
不是一个有意义的步骤,它应该是一个
pd.Timedelta
对象,但步骤参数必须是整数。

所以,滚动功能似乎无法达到我想要的效果。那么,pandas 有什么解决方法呢?我想要一个能够处理不规则数据的数据,即不依赖于我的数据 时间戳具有某种固定的频率。我可以使用

groupby
来获取时间组,但我没有找到使用 groupby 获取重叠窗口的方法...

python pandas group-by pandas-rolling
1个回答
0
投票

实现此目的的一种方法是使用自定义函数来手动生成所需的滚动窗口:

import pandas as pd
import numpy as np

# Sample data
times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1), periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})

def rolling_window_custom(df, window_size):
    # Convert window size to numpy timedelta64 for comparison
    window_size = np.timedelta64(window_size, 's')
    results = []

    for start_time in df['times']:
        end_time = start_time + window_size
        # Filter rows within the time window
        window_df = df[(df['times'] >= start_time) & (df['times'] < end_time)]
        results.append(window_df)

    return results

# Usage
window_size = pd.Timedelta(seconds=2).seconds  # Window size in seconds
windows = rolling_window_custom(df, window_size)

for window in windows:
    print(window)
    # Perform your operation on each window here

© www.soinside.com 2019 - 2024. All rights reserved.