只要条件为真,熊猫就会连续累积时间

问题描述 投票:0回答:2

希望有持续时间/时间差异累积,只要“state”== 1处于活动状态,否则为“off”

    timestamp         state
    2020-01-01 00:00:00 0
    2020-01-01 00:00:01 0
    2020-01-01 00:00:02 0
    2020-01-01 00:00:03 1
    2020-01-01 00:00:04 1
    2020-01-01 00:00:05 1
    2020-01-01 00:00:06 1
    2020-01-01 00:00:07 0
    2020-01-01 00:00:08 0
    2020-01-01 00:00:09 0
    2020-01-01 00:00:10 0
    2020-01-01 00:00:11 1
    2020-01-01 00:00:12 1
    2020-01-01 00:00:13 1
    2020-01-01 00:00:14 1
    2020-01-01 00:00:15 1
    2020-01-01 00:00:16 1
    2020-01-01 00:00:17 0
    2020-01-01 00:00:18 0
    2020-01-01 00:00:19 0
    2020-01-01 00:00:20 0

基于类似的问题,我尝试使用 groupby 进行一些操作,但是,当“state”== 0 时,代码会忽略停止执行 timediff 。
我还尝试应用 lambda 函数(已注释),但弹出错误“KeyError: ('state', '发生在索引时间戳')” 知道如何做得更好吗?

    import numpy as np
    import pandas as pd
    
    dt = pd.date_range('2020-01-01', '2020-01-01 00:00:20',freq='1s')
    s = [0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0]
    df = pd.DataFrame({'timestamp': dt,
                       'state': s})
        
    df['timestamp']=pd.to_datetime(df.timestamp, format='%Y-%m-%d %H:%M:%S')
    df['tdiff']=(df.groupby('state').diff().timestamp.values/60)
    #df['tdiff'] = df.apply(lambda x: x['timestamp'].diff().state.values/60 if x['state'] == 1 else 'off')

期望的输出应该是:

    timestamp        state tdiff accum.
    2020-01-01 00:00:00 0   off 0
    2020-01-01 00:00:01 0   off 0
    2020-01-01 00:00:02 0   off 0
    2020-01-01 00:00:03 1   nan 0   
    2020-01-01 00:00:04 1   1.0 1.0
    2020-01-01 00:00:05 1   1.0 2.0
    2020-01-01 00:00:06 1   1.0 3.0
    2020-01-01 00:00:07 0   off 0
    2020-01-01 00:00:08 0   off 0
    2020-01-01 00:00:09 0   off 0
    2020-01-01 00:00:10 0   off 0
    2020-01-01 00:00:11 1   nan 0
    2020-01-01 00:00:12 1   1.0 1.0
    2020-01-01 00:00:13 1   1.0 2.0
    2020-01-01 00:00:14 1   1.0 3.0
    2020-01-01 00:00:15 1   1.0 4.0
    2020-01-01 00:00:16 1   1.0 5.0
python pandas performance numpy time-series
2个回答
2
投票

您可以使用

groupby
cumsum
检查附加组密钥

g = df.loc[df['state'].ne(0)].groupby(df['state'].eq(0).cumsum())['timestamp']
s1 = g.diff().dt.total_seconds()
s2 = g.apply(lambda x : x.diff().dt.total_seconds().cumsum())
df['tdiff'] = 'off'
df.loc[df['state'].ne(0),'tdiff'] = s1

df['accum'] = s2
# notice I did not fillna with 0, you can do it with df['accum'].fillna(0,inplace=True)
  
df
Out[53]: 
             timestamp  state tdiff  accum
0  2020-01-01 00:00:00      0   off    NaN
1  2020-01-01 00:00:01      0   off    NaN
2  2020-01-01 00:00:02      0   off    NaN
3  2020-01-01 00:00:03      1   NaN    NaN
4  2020-01-01 00:00:04      1     1    1.0
5  2020-01-01 00:00:05      1     1    2.0
6  2020-01-01 00:00:06      1     1    3.0
7  2020-01-01 00:00:07      0   off    NaN
8  2020-01-01 00:00:08      0   off    NaN
9  2020-01-01 00:00:09      0   off    NaN
10 2020-01-01 00:00:10      0   off    NaN
11 2020-01-01 00:00:11      1   NaN    NaN
12 2020-01-01 00:00:12      1     1    1.0
13 2020-01-01 00:00:13      1     1    2.0
14 2020-01-01 00:00:14      1     1    3.0
15 2020-01-01 00:00:15      1     1    4.0
16 2020-01-01 00:00:16      1     1    5.0
17 2020-01-01 00:00:17      0   off    NaN
18 2020-01-01 00:00:18      0   off    NaN
19 2020-01-01 00:00:19      0   off    NaN
20 2020-01-01 00:00:20      0   off    NaN

0
投票
def function1(dd:pd.DataFrame):
    if dd.pipe(len)<2:
        return dd.assign(tdiff='off',accum=0)
    else:
        dd1=dd.assign(tdiff=1,accum=range(0,dd.pipe(len)))
        dd1.loc[dd.index.min(),'tdiff']=pd.NA
        return dd1

col1=df1.state.ne(1).cumsum()
df1.assign(col1=col1).groupby(['state',col1],as_index=False).apply(function1)


:


             timestamp  state tdiff  accum
0  2020-01-01 00:00:00      0   off    NaN
1  2020-01-01 00:00:01      0   off    NaN
2  2020-01-01 00:00:02      0   off    NaN
3  2020-01-01 00:00:03      1   NaN    NaN
4  2020-01-01 00:00:04      1     1    1.0
5  2020-01-01 00:00:05      1     1    2.0
6  2020-01-01 00:00:06      1     1    3.0
7  2020-01-01 00:00:07      0   off    NaN
8  2020-01-01 00:00:08      0   off    NaN
9  2020-01-01 00:00:09      0   off    NaN
10 2020-01-01 00:00:10      0   off    NaN
11 2020-01-01 00:00:11      1   NaN    NaN
12 2020-01-01 00:00:12      1     1    1.0
13 2020-01-01 00:00:13      1     1    2.0
14 2020-01-01 00:00:14      1     1    3.0
15 2020-01-01 00:00:15      1     1    4.0
16 2020-01-01 00:00:16      1     1    5.0
17 2020-01-01 00:00:17      0   off    NaN
18 2020-01-01 00:00:18      0   off    NaN
19 2020-01-01 00:00:19      0   off    NaN
20 2020-01-01 00:00:20      0   off    NaN
© www.soinside.com 2019 - 2024. All rights reserved.