这是我的数据:
times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1),
periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})
输出:
times data
0 2024-03-20 10:38:44.100877000 0
1 2024-03-20 10:38:45.100877416 1
2 2024-03-20 10:38:46.100877833 2
3 2024-03-20 10:38:47.100878250 3
4 2024-03-20 10:38:48.100878666 4
.. ... ...
56 2024-03-20 10:39:40.100900333 56
57 2024-03-20 10:39:41.100900750 57
58 2024-03-20 10:39:42.100901166 58
59 2024-03-20 10:39:43.100901583 59
60 2024-03-20 10:39:44.100902000 60
如果我想用 2 秒的滚动窗口对其进行分组,我可以这样做:
df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2))
for window in df_windows:
print(window)
然后我明白了:
times
2024-03-20 10:48:09.273265 0
data
times
2024-03-20 10:48:09.273265000 0
2024-03-20 10:48:10.273265333 1
data
times
2024-03-20 10:48:10.273265333 1
2024-03-20 10:48:11.273265666 2
data
times
2024-03-20 10:48:11.273265666 2
2024-03-20 10:48:12.273266000 3
data
酷。但如果我不想要一个相对于每一行计算的窗口,那么 pandas 似乎缺乏做到这一点的功能?例如。
rolling
(https://github.com/pandas-dev/pandas/issues/15354)添加了一个步骤参数,但它不适用于这种情况:
df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2), step=2)
NotImplementedError: step is not supported with frequency windows
它也没有多大意义,因为
2
不是一个有意义的步骤,它应该是一个
pd.Timedelta
对象,但步骤参数必须是整数。
所以,滚动功能似乎无法达到我想要的效果。那么,pandas 有什么解决方法呢?我想要一个能够处理不规则数据的数据,即不依赖于我的数据 时间戳具有某种固定的频率。我可以使用
groupby
来获取时间组,但我没有找到使用 groupby 获取重叠窗口的方法...
实现此目的的一种方法是使用自定义函数来手动生成所需的滚动窗口:
import pandas as pd
import numpy as np
# Sample data
times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1), periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})
def rolling_window_custom(df, window_size):
# Convert window size to numpy timedelta64 for comparison
window_size = np.timedelta64(window_size, 's')
results = []
for start_time in df['times']:
end_time = start_time + window_size
# Filter rows within the time window
window_df = df[(df['times'] >= start_time) & (df['times'] < end_time)]
results.append(window_df)
return results
# Usage
window_size = pd.Timedelta(seconds=2).seconds # Window size in seconds
windows = rolling_window_custom(df, window_size)
for window in windows:
print(window)
# Perform your operation on each window here