假设我有以下DF:
id flag time
1 1 2017-01-01 UTC--2017-01-07 UTC
1 0 2018-01-01 UTC--2019-01-01 UTC
1 0 2017-01-03 UTC--2017-01-09 UTC
2 1 2017-01-01 UTC--2017-01-15 UTC
2 1 2018-07-01 UTC--2018-09-01 UTC
2 1 2018-10-12 UTC--2018-10-20 UTC
2 0 2017-01-12 UTC--2017-01-16 UTC
2 0 2017-03-01 UTC--2017-03-15 UTC
2 0 2017-12-01 UTC--2017-12-31 UTC
2 0 2018-08-15 UTC--2018-09-19 UTC
2 0 2018-10-01 UTC--2018-10-21 UTC
使用以下代码创建:
df <- data.frame(id=c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
flag=c(1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0),
time=c(interval(ymd(20170101), ymd(20170107)),
interval(ymd(20180101), ymd(20190101)),
interval(ymd(20170103), ymd(20170109)),
# Casos
interval(ymd(20170101), ymd(20170115)),
interval(ymd(20180701), ymd(20180901)),
interval(ymd(20181012), ymd(20181020)),
# Controles
interval(ymd(20170112), ymd(20170116)),
interval(ymd(20170301), ymd(20170315)),
interval(ymd(20171201), ymd(20171231)),
interval(ymd(20180815), ymd(20180919)),
interval(ymd(20181001), ymd(20181021))))
而且我想获得这个结果
id flag time value
1 1 2017-01-01 UTC--2017-01-07 UTC NA
1 0 2018-01-01 UTC--2019-01-01 UTC 0
1 0 2017-01-03 UTC--2017-01-09 UTC 1
2 1 2017-01-01 UTC--2017-01-15 UTC NA
2 1 2018-07-01 UTC--2018-09-01 UTC NA
2 1 2018-10-12 UTC--2018-10-20 UTC NA
2 0 2017-01-12 UTC--2017-01-16 UTC 1
2 0 2017-03-01 UTC--2017-03-15 UTC 0
2 0 2017-12-01 UTC--2017-12-31 UTC 0
2 0 2018-08-15 UTC--2018-09-19 UTC 1
2 0 2018-10-01 UTC--2018-10-21 UTC 1
这是,我想将标志= 0的时间间隔与每个组中所有可能的标志= 1进行比较,以查看标志0和标志1之间是否存在至少一个时间重叠
为此,我尝试使用lubridate int_overlaps函数
我已经尝试了以下代码,但是不起作用:
result <- df %>%
group_by(id) %>%
mutate(value = ifelse(flag == 0 & int_overlaps(time, any(time[flag == 1])), 1, 0))
我发现了一种非常相似的方法:
R: Determine if each date interval overlaps with all other date intervals in a dataframe
您可以使用map_int
中的purrr
来查看any
间隔是否在每个id
中重叠:
library(tidyverse)
library(lubridate)
df %>%
group_by(id) %>%
mutate(value = ifelse(flag == 0, map_int(time, ~ any(int_overlaps(.x, time[flag == 1]))), NA))
输出
# A tibble: 11 x 4
# Groups: id [2]
id flag time value
<dbl> <dbl> <Interval> <int>
1 1 1 2017-01-01 UTC--2017-01-07 UTC NA
2 1 0 2018-01-01 UTC--2019-01-01 UTC 0
3 1 0 2017-01-03 UTC--2017-01-09 UTC 1
4 2 1 2017-01-01 UTC--2017-01-15 UTC NA
5 2 1 2018-07-01 UTC--2018-09-01 UTC NA
6 2 1 2018-10-12 UTC--2018-10-20 UTC NA
7 2 0 2017-01-12 UTC--2017-01-16 UTC 1
8 2 0 2017-03-01 UTC--2017-03-15 UTC 0
9 2 0 2017-12-01 UTC--2017-12-31 UTC 0
10 2 0 2018-08-15 UTC--2018-09-19 UTC 1
11 2 0 2018-10-01 UTC--2018-10-21 UTC 1
我添加从此处提取的另一个答案:
R: Determine if each date interval overlaps with all other date intervals in a dataframe
result <- df %>% group_by(id) %>%
mutate(value = map(seq_along(time), function(x){
y = setdiff(seq_along(time[flag == 1]), x)
return(any(int_overlaps(time[x], time[y])))
}))