在不使用 df.iterrows() 的情况下计算时间序列中的连续发生次数

Question

给定一个具有如下时间序列的数据框：

时间	活动
2020-01-01 12:00:00	1
2020-01-01 12:00:01	NaN
2020-01-01 12:00:02	1
2020-01-01 12:00:03	1
2020-01-01 12:00:04	NaN
2020-01-01 12:00:05	NaN
2020-01-01 12:00:06	1
2020-01-01 12:00:07	南

我想获得一个摘要数据框，例如：

事件_id	时间_开始	时间停止
1	2020-01-01 12:00:00	2020-01-01 12:00:01
2	2020-01-01 12:00:02	2020-01-01 12:00:04
3	2020-01-01 12:00:06	2020-01-01 12:00:07

在逐步方法中，我认为我应该首先添加一个空列“event_i”，然后填写事件的索引（1,2,3,...）一旦这有效，我就可以尝试创建一个摘要数据框。我已经陷入了为事件提供索引的困境。

我可以用 df.iterrows() 解决一些问题，但不建议这样做。 如何矢量化这个索引过程？

import pandas as pd
import numpy as np
# define mini-dataset as an example
data= {'time': ['2020-01-01 12:00:00', '2020-01-01 12:00:01', '2020-01-01 12:00:02','2020-01-01 12:00:03',
              '2020-01-01 12:00:04','2020-01-01 12:00:05', '2020-01-01 12:00:06', '2020-01-01 12:00:07',
              '2020-01-01 12:00:08', '2020-01-01 12:00:09','2020-01-01 12:00:10'],
     'event': [1,np.nan,1,1,np.nan,np.nan,1,np.nan,1,1,np.nan]}
df = pd.DataFrame(data)
df['time']=pd.to_datetime((df['time']))

# give a sequential number to each event
df['event_i'] = np.nan

# for each event-number, group by and stack: event_id,  time_start time_stop
# ...

Answer 1

代码

# Create a grouper to mark the intervals of successive events
m = df['event'].isna()
b = m.cumsum().mask(m).ffill(limit=1)

# group the time column by the grouper and agregate with first and last
df1 = df['time'].groupby(b).agg(['first', 'last']).reset_index(drop=True)

# Create event id column
df1['event_id'] = df1.index + 1

                first                last  event_id
0 2020-01-01 12:00:00 2020-01-01 12:00:01         1
1 2020-01-01 12:00:02 2020-01-01 12:00:04         2
2 2020-01-01 12:00:06 2020-01-01 12:00:07         3
3 2020-01-01 12:00:08 2020-01-01 12:00:10         4

在不使用 df.iterrows() 的情况下计算时间序列中的连续发生次数

问题描述投票：0回答：1

1个回答

代码

最新问题

在不使用 df.iterrows() 的情况下计算时间序列中的连续发生次数

问题描述 投票：0回答：1

1个回答

代码

最新问题

问题描述投票：0回答：1