在最后一个非零值之后的下一个小时创建标记

问题描述 投票:0回答:1

我有一个带有伪数据的数据框:


library("lubridate")
library("dplyr")

df <- data.frame(

  time = seq.POSIXt(from = ymd_hms("2017-05-12 00:00:00"), to = ymd_hms("2017-05-12 02:25:00"), by = "5 mins"),

  value = c(rep(0, 10), 1500, 0, 1000, rep(0,17))

)

看起来像这样:


                  time value

1  2017-05-12 00:00:00     0

2  2017-05-12 00:05:00     0

3  2017-05-12 00:10:00     0

4  2017-05-12 00:15:00     0

5  2017-05-12 00:20:00     0

6  2017-05-12 00:25:00     0

7  2017-05-12 00:30:00     0

8  2017-05-12 00:35:00     0

9  2017-05-12 00:40:00     0

10 2017-05-12 00:45:00     0

11 2017-05-12 00:50:00  1500

12 2017-05-12 00:55:00     0

13 2017-05-12 01:00:00  1000

14 2017-05-12 01:05:00     0

15 2017-05-12 01:10:00     0

16 2017-05-12 01:15:00     0

17 2017-05-12 01:20:00     0

18 2017-05-12 01:25:00     0

19 2017-05-12 01:30:00     0

20 2017-05-12 01:35:00     0

21 2017-05-12 01:40:00     0

22 2017-05-12 01:45:00     0

23 2017-05-12 01:50:00     0

24 2017-05-12 01:55:00     0

25 2017-05-12 02:00:00     0

26 2017-05-12 02:05:00     0

27 2017-05-12 02:10:00     0

28 2017-05-12 02:15:00     0

29 2017-05-12 02:20:00     0

30 2017-05-12 02:25:00     0

我想创建一个标志变量来指示活动,它将包括该值大于零的瞬间,以及下一个整小时的'1'/'on'。

因此,如果在00:50处有1500,那么活动应该持续到01:50,包括01:50。

如果在此期间内还有另一个非零值,那么活动也必须继续进行下一小时。

最终产品看起来像这样:


                 time value flag

1  2017-05-12 00:00:00     0  OFF

2  2017-05-12 00:05:00     0  OFF

3  2017-05-12 00:10:00     0  OFF

4  2017-05-12 00:15:00     0  OFF

5  2017-05-12 00:20:00     0  OFF

6  2017-05-12 00:25:00     0  OFF

7  2017-05-12 00:30:00     0  OFF

8  2017-05-12 00:35:00     0  OFF

9  2017-05-12 00:40:00     0  OFF

10 2017-05-12 00:45:00     0  OFF

11 2017-05-12 00:50:00  1500   ON

12 2017-05-12 00:55:00     0   ON

13 2017-05-12 01:00:00  1000   ON

14 2017-05-12 01:05:00     0   ON

15 2017-05-12 01:10:00     0   ON

16 2017-05-12 01:15:00     0   ON

17 2017-05-12 01:20:00     0   ON

18 2017-05-12 01:25:00     0   ON

19 2017-05-12 01:30:00     0   ON

20 2017-05-12 01:35:00     0   ON

21 2017-05-12 01:40:00     0   ON

22 2017-05-12 01:45:00     0   ON

23 2017-05-12 01:50:00     0   ON  <-- first occurrence stops having effect

24 2017-05-12 01:55:00     0   ON  <-- effect of second occurrence

25 2017-05-12 02:00:00     0   ON  <-- continues the activity then stops

26 2017-05-12 02:05:00     0  OFF

27 2017-05-12 02:10:00     0  OFF

28 2017-05-12 02:15:00     0  OFF

29 2017-05-12 02:20:00     0  OFF

30 2017-05-12 02:25:00     0  OFF

坦率地说,我不知道如何将该任务分解为可行的for循环或函数。任何帮助或线索都将受到高度赞赏

r function dataframe time-series flags
1个回答
0
投票

我们可以基于大于cumsum的“值”的出现来创建分组变量

library(dplyr)
library(lubridate)
df %>% group_by(ind = cummax(value > 0)) %>% group_by(group2 =  cumsum(time >  (time[1] + hours(1))), add = TRUE) %>%  mutate(flag = c("OFF", "ON")[1 + (any(value > 0))]) %>% ungroup %>% select(-ind, -group2)
© www.soinside.com 2019 - 2024. All rights reserved.