我有一个带有伪数据的数据框:
library("lubridate")
library("dplyr")
df <- data.frame(
time = seq.POSIXt(from = ymd_hms("2017-05-12 00:00:00"), to = ymd_hms("2017-05-12 02:25:00"), by = "5 mins"),
value = c(rep(0, 10), 1500, 0, 1000, rep(0,17))
)
看起来像这样:
time value
1 2017-05-12 00:00:00 0
2 2017-05-12 00:05:00 0
3 2017-05-12 00:10:00 0
4 2017-05-12 00:15:00 0
5 2017-05-12 00:20:00 0
6 2017-05-12 00:25:00 0
7 2017-05-12 00:30:00 0
8 2017-05-12 00:35:00 0
9 2017-05-12 00:40:00 0
10 2017-05-12 00:45:00 0
11 2017-05-12 00:50:00 1500
12 2017-05-12 00:55:00 0
13 2017-05-12 01:00:00 1000
14 2017-05-12 01:05:00 0
15 2017-05-12 01:10:00 0
16 2017-05-12 01:15:00 0
17 2017-05-12 01:20:00 0
18 2017-05-12 01:25:00 0
19 2017-05-12 01:30:00 0
20 2017-05-12 01:35:00 0
21 2017-05-12 01:40:00 0
22 2017-05-12 01:45:00 0
23 2017-05-12 01:50:00 0
24 2017-05-12 01:55:00 0
25 2017-05-12 02:00:00 0
26 2017-05-12 02:05:00 0
27 2017-05-12 02:10:00 0
28 2017-05-12 02:15:00 0
29 2017-05-12 02:20:00 0
30 2017-05-12 02:25:00 0
我想创建一个标志变量来指示活动,它将包括该值大于零的瞬间,以及下一个整小时的'1'/'on'。
因此,如果在00:50处有1500,那么活动应该持续到01:50,包括01:50。
如果在此期间内还有另一个非零值,那么活动也必须继续进行下一小时。
最终产品看起来像这样:
time value flag
1 2017-05-12 00:00:00 0 OFF
2 2017-05-12 00:05:00 0 OFF
3 2017-05-12 00:10:00 0 OFF
4 2017-05-12 00:15:00 0 OFF
5 2017-05-12 00:20:00 0 OFF
6 2017-05-12 00:25:00 0 OFF
7 2017-05-12 00:30:00 0 OFF
8 2017-05-12 00:35:00 0 OFF
9 2017-05-12 00:40:00 0 OFF
10 2017-05-12 00:45:00 0 OFF
11 2017-05-12 00:50:00 1500 ON
12 2017-05-12 00:55:00 0 ON
13 2017-05-12 01:00:00 1000 ON
14 2017-05-12 01:05:00 0 ON
15 2017-05-12 01:10:00 0 ON
16 2017-05-12 01:15:00 0 ON
17 2017-05-12 01:20:00 0 ON
18 2017-05-12 01:25:00 0 ON
19 2017-05-12 01:30:00 0 ON
20 2017-05-12 01:35:00 0 ON
21 2017-05-12 01:40:00 0 ON
22 2017-05-12 01:45:00 0 ON
23 2017-05-12 01:50:00 0 ON <-- first occurrence stops having effect
24 2017-05-12 01:55:00 0 ON <-- effect of second occurrence
25 2017-05-12 02:00:00 0 ON <-- continues the activity then stops
26 2017-05-12 02:05:00 0 OFF
27 2017-05-12 02:10:00 0 OFF
28 2017-05-12 02:15:00 0 OFF
29 2017-05-12 02:20:00 0 OFF
30 2017-05-12 02:25:00 0 OFF
坦率地说,我不知道如何将该任务分解为可行的for循环或函数。任何帮助或线索都将受到高度赞赏
我们可以基于大于cumsum
的“值”的出现来创建分组变量
library(dplyr)
library(lubridate)
df %>% group_by(ind = cummax(value > 0)) %>% group_by(group2 = cumsum(time > (time[1] + hours(1))), add = TRUE) %>% mutate(flag = c("OFF", "ON")[1 + (any(value > 0))]) %>% ungroup %>% select(-ind, -group2)