我正在尝试识别时间序列 Pandas 数据框中的事件。事件是指某个值非零持续超过 30 秒。只要该值连续 30 秒或更长时间不为 0,事件就可以包含等于 0 的值。如果事件短于 30 秒且被零包围,则该事件不是事件。事件在最后一个非零值处结束,其中后续值在 30 秒或更长时间内为零。我希望输出看起来像代表中的“事件”列。
代表:
import pandas as pd
Timestamp = pd.date_range("11-30-2023 23:54:00", periods = 63, freq = "5s")
Value=[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0]
Events = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]
df = pd.DataFrame({"Timestamp":Timestamp, "Value":Value, "Events":Events})
逻辑并不完全清楚,但假设事件是连续的非零值,持续时间至少为 30 秒:
# identify non-null values
m = df['Value'].ne(0)
# form groups of consecutive non-nulls
group = (~m).cumsum()
# compute the duration of the chunks of non null values
nonzero_chunks = df.loc[m, 'Timestamp'].groupby(group).agg(np.ptp)
# filter those ≥30s, add new column
keep = nonzero_chunks[nonzero_chunks.ge('30s')].index
df['Events'] = group.isin(keep).astype(int)
输出:
Timestamp Value Events
0 2023-11-30 23:54:00 0.5 1
1 2023-11-30 23:54:05 0.5 1
2 2023-11-30 23:54:10 0.5 1
3 2023-11-30 23:54:15 0.5 1
4 2023-11-30 23:54:20 0.5 1
.. ... ... ...
58 2023-11-30 23:58:50 0.5 1
59 2023-11-30 23:58:55 0.0 0
60 2023-11-30 23:59:00 0.0 0
61 2023-11-30 23:59:05 0.0 0
62 2023-11-30 23:59:10 0.0 0
[63 rows x 3 columns]