Var1
等于0的行,然后计算类型1的事件之间的类型1的事件与类型2的事件之间的时间(Var1 == 0
除外)的时间,因此在上述情况下:Start_time: 19, Time_inbetween: 12, Event_count: 4
Start_time: 31, Time_inbetween: 5, Event_count: 1
我通过以下方式进行此操作:
i=0 eventCounter = 0 lastStartTime = 0 length = data[data['EvntType']==1].shape[0] results = np.zeros((length,3),dtype=int) for row in data[data['Var1'] > 0].iterrows(): myRow = row[1] if myRow['EvntType'] == 1: results[i,0] = lastStartTime results[i,1] = myRow['Time'] - lastStartTime results[i,2] = eventCounter lastStartTime = myRow['Time'] eventCounter = 0 i += 1 else: eventCounter += 1
这给了我想要的结果:
>>> results[1:] array([[19, 12, 4], [31, 5, 1]])
但是这似乎确实可以绕开,并且在大型数据帧上花费很长时间。我该如何改善?
我在python 2.7中有一个pandas数据框,我想遍历行并获取两种类型的事件之间的时间以及之间的其他类型事件的计数(给定的条件...
mask = df['EvntType']==1
# 0 False
# 1 True
# ...
# 9 True
# 10 False
# Name: EvntType, dtype: bool
找到与Time
为True的行关联的mask
:
times = df.loc[mask, 'Time']
# 1 19
# 7 31
# 9 36
# Name: Time, dtype: int64
并且还找到mask
为True的顺序索引:
idx = np.flatnonzero(mask)
# array([1, 6, 8])
start_time
是times[:-1]
中的所有值。
In [56]: times[:-1]
Out[56]:
1 19
7 31
Name: Time, dtype: int64
time_inbetween
是时间差,np.diff(times)
In [55]: np.diff(times)
Out[55]: array([12, 5])
event_count
是idx
的差值减去1。
In [57]: np.diff(idx)-1
Out[57]: array([4, 1])
import numpy as np import pandas as pd df = pd.DataFrame({'EvntType': [2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2], 'Time': [15, 19, 21, 23, 25, 26, 28, 31, 33, 36, 39], 'Var1': [1, 1, 6, 3, 0, 2, 3, 5, 1, 5, 1], 'Var2': [17, 45, 43, 65, 76, 35, 25, 16, 25, 36, 21]}) # Remove rows where Var1 equals 0 df = df.loc[df['Var1'] != 0] mask = df['EvntType']==1 times = df.loc[mask, 'Time'] idx = np.flatnonzero(mask) result = pd.DataFrame( {'start_time': times[:-1], 'time_inbetween': np.diff(times), 'event_count': np.diff(idx)-1}) print(result)
产量
event_count start_time time_inbetween
1 4 19 12
7 1 31 5