我有一个如下设置的数据框:
df <- data.frame("id" = c(111,111,111,222,222,222,222,333,333,333,333), "Location" = c("A","B","A","A","C","B","A","B","A","A","A"), "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
id Location Encounter
1 111 A 1
2 111 B 2
3 111 A 3
4 222 A 1
5 222 C 2
6 222 B 3
7 222 A 4
8 333 B 1
9 333 A 2
10 333 B 3
11 333 A 4
我基本上是试图为每个id组在先前的Encounter中创建一个位置的二进制标志。因此它看起来像:
id Location Encounter Flag
1 111 A 1 0
2 111 B 2 0
3 111 A 3 1
4 222 A 1 0
5 222 C 2 0
6 222 B 3 0
7 222 A 4 1
8 333 B 1 0
9 333 A 2 0
10 333 B 3 1
11 333 A 4 1
我试图弄清楚如何执行if语句,例如:
df$Flag <- case_when((df$id - lag(df$id)) == 0 ~ case_when(df$Location == lag(df$Location, 1)
| df$Location == lag(df$Location, 2)
| df$Location == lag(df$Location, 3) ~ 1, T ~ 0), T ~ 0)
id Location Flag
1 111 A 0
2 111 B 0
3 111 A 1
4 222 A 0
5 222 C 0
6 222 B 0
7 222 A 1
8 333 B 0
9 333 A 1
10 333 B 1
11 333 A 1
但是这会给第9行错误地分配1,并且在实际数据中遇到15次以上的情况,因此变得非常麻烦。我希望找到一种方法来完成
lag(df$Location, 1:df$Encounter)
但是我知道lag()需要k的整数,因此特定命令将不起作用。
任何帮助将不胜感激!
也是这里的第一次发布者,请随时对以后的问题提出建设性的批评! :)
使用data.table
:
library(data.table)
df <- data.table("id" = c(111,111,111,222,222,222,222,333,333,333,333), "Location" = c("A","B","A","A","C","B","A","B","A","A","A"), "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
df[, flag:=1]
df[, flag:=cumsum(flag), by=.(id,Location)]
df[, flag:=ifelse(flag>1,1,0)]