我有一个数据集,其中有事件发生的二进制指示器。从这个列表中,我想创建一个没有事件发生的连续时间步数的计数。举个例子(TS = 时间步长,EV = 事件指示器,C = 计数):
TS1 -> TS2 -> TS3 -> TS4 -> TS5 ->...
EV0 -> EV0 -> EV1 -> EV0 -> EV0 ->...
C0 -> C1 -> C0 -> C0 -> C1 ->...
作为示例数据框,请考虑:
labs <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D", "D")
time <- c(1,2,3,4 ,1,2,3,4 ,1,2,3,4 ,1,2,3,4,5)
event <- c(0,0,0,0, 0,1,0,0, 1,1,0,0, NA,0,0,1,0)
desiredOutcome <- c(0,1,2,3,0,0,0,1,0,0,0,1,NA,0,1,0,0) # goal
exDF <- data.frame(labs,time, event, desiredOutcome)
根据最终目标和数据框,我最终得到了以下代码:
library(dplyr)
exDF <- exDF %>%
group_by(labs) %>%
mutate(pe1 = lag(event, order_by=time)) # create new variable for prior event
exDF$count2 <- ifelse(
((exDF$pe1 == 1) & (exDF$event == 0)), # condition checks for rows where previous timestep is included & had event WHERE event is not ongoing in this timestep
0, # True val
NA) # False val
exDF$count <- ifelse(
(is.na(exDF$pe1) & (exDF$event == 0)), # condition checks for rows where previous timestep is not included & no current event
0, # True val
exDF$count2) # False val
似乎正确填写了所有零。但是,我不知道有什么好方法可以从填充适当的 0 和其他带有 NA 的值得到我想要的结果。
我的大部分实验都与组合 mutate 和 lag 相关,但它们只会导致填充下一组值(如果零位于输入列中,则单独显示 1;如果是 1,则显示 2)。以下示例不会尝试处理计数重置,但会导致上述行为:
exDF <- exDF %>%
group_by(labs) %>%
mutate(countFinal = lag(count, order_by=time) + 1)
所以,我的挑战与事情解决的顺序有关。对于这里的 mutate 命令,顺序似乎是:
Pull all cell values by label -> Look at their lags -> Add 1 -> Done, but incorrectly
当我需要它时:
Pull first cell value by label -> Look at lag -> Add 1 or reset -> Pull second cell (filled in prior step) value by label -> Look at their lags -> Add 1 or reset -> Pull third... -> Done
有没有一个好的方法可以使用现有的包来做到这一点?
想不出更直接的方法,但这可行。工作流程:
replace()
每个组中的第一个值为零,并将剩余的非零组 ID 更改为 1library(dplyr)
exDF |>
group_by(labs) %>%
mutate(tmp = if_else(is.na(event), 2, event),
tmp = cumsum(tmp != lag(tmp, default = 1))) |>
group_by(labs, tmp) |>
mutate(tmp = replace(tmp, 1, 0),
tmp = if_else(tmp != 0, 1, 0),
tmp = cumsum(tmp),
desiredOutcome = if_else(is.na(event), NA, desiredOutcome)) |>
ungroup() |>
select(-tmp)
# # A tibble: 17 × 4
# labs time event desiredOutcome
# <chr> <dbl> <dbl> <dbl>
# 1 A 1 0 0
# 2 A 2 0 1
# 3 A 3 0 2
# 4 A 4 0 3
# 5 B 1 0 0
# 6 B 2 1 0
# 7 B 3 0 0
# 8 B 4 0 1
# 9 C 1 1 0
# 10 C 2 1 0
# 11 C 3 0 0
# 12 C 4 0 1
# 13 D 1 NA NA
# 14 D 2 0 0
# 15 D 3 0 1
# 16 D 4 1 0
# 17 D 5 0 0