识别时间序列 Pandas 数据框中的事件

问题描述 投票:0回答:1

我正在尝试识别时间序列 Pandas 数据框中的事件。事件是指某个值非零持续超过 30 秒。只要该值连续 30 秒或更长时间不为 0,事件就可以包含等于 0 的值。如果事件短于 30 秒且被零包围,则该事件不是事件。事件在最后一个非零值处结束,其中后续值在 30 秒或更长时间内为零。我希望输出看起来像代表中的“事件”列。

代表:

import pandas as pd


Timestamp = pd.date_range("11-30-2023 23:54:00", periods = 63, freq = "5s")
Value=[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0]
Events = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0] 
df = pd.DataFrame({"Timestamp":Timestamp, "Value":Value, "Events":Events})
python pandas time-series
1个回答
0
投票

逻辑并不完全清楚,但假设事件是连续的非零值,持续时间至少为 30 秒:

# identify non-null values
m = df['Value'].ne(0)
# form groups of consecutive non-nulls
group = (~m).cumsum()

# compute the duration of the chunks of non null values
nonzero_chunks = df.loc[m, 'Timestamp'].groupby(group).agg(np.ptp)

# filter those ≥30s, add new column
keep = nonzero_chunks[nonzero_chunks.ge('30s')].index

df['Events'] = group.isin(keep).astype(int)

输出:

             Timestamp  Value  Events
0  2023-11-30 23:54:00    0.5       1
1  2023-11-30 23:54:05    0.5       1
2  2023-11-30 23:54:10    0.5       1
3  2023-11-30 23:54:15    0.5       1
4  2023-11-30 23:54:20    0.5       1
..                 ...    ...     ...
58 2023-11-30 23:58:50    0.5       1
59 2023-11-30 23:58:55    0.0       0
60 2023-11-30 23:59:00    0.0       0
61 2023-11-30 23:59:05    0.0       0
62 2023-11-30 23:59:10    0.0       0

[63 rows x 3 columns]
© www.soinside.com 2019 - 2024. All rights reserved.