按组的时间间隔重叠匹配

问题描述 投票:2回答:2

假设我有以下DF:

id  flag            time
1   1   2017-01-01 UTC--2017-01-07 UTC
1   0   2018-01-01 UTC--2019-01-01 UTC
1   0   2017-01-03 UTC--2017-01-09 UTC
2   1   2017-01-01 UTC--2017-01-15 UTC
2   1   2018-07-01 UTC--2018-09-01 UTC
2   1   2018-10-12 UTC--2018-10-20 UTC
2   0   2017-01-12 UTC--2017-01-16 UTC
2   0   2017-03-01 UTC--2017-03-15 UTC
2   0   2017-12-01 UTC--2017-12-31 UTC
2   0   2018-08-15 UTC--2018-09-19 UTC
2   0   2018-10-01 UTC--2018-10-21 UTC

使用以下代码创建:

df <- data.frame(id=c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),     
                  flag=c(1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), 
                  time=c(interval(ymd(20170101), ymd(20170107)),
                       interval(ymd(20180101), ymd(20190101)), 
                       interval(ymd(20170103), ymd(20170109)), 
                       # Casos
                       interval(ymd(20170101), ymd(20170115)), 
                       interval(ymd(20180701), ymd(20180901)),
                       interval(ymd(20181012), ymd(20181020)),
                       # Controles
                       interval(ymd(20170112), ymd(20170116)),
                       interval(ymd(20170301), ymd(20170315)),
                       interval(ymd(20171201), ymd(20171231)),
                       interval(ymd(20180815), ymd(20180919)),
                       interval(ymd(20181001), ymd(20181021))))

而且我想获得这个结果

id  flag            time              value
1   1   2017-01-01 UTC--2017-01-07 UTC  NA
1   0   2018-01-01 UTC--2019-01-01 UTC  0
1   0   2017-01-03 UTC--2017-01-09 UTC  1
2   1   2017-01-01 UTC--2017-01-15 UTC  NA
2   1   2018-07-01 UTC--2018-09-01 UTC  NA
2   1   2018-10-12 UTC--2018-10-20 UTC  NA
2   0   2017-01-12 UTC--2017-01-16 UTC  1
2   0   2017-03-01 UTC--2017-03-15 UTC  0
2   0   2017-12-01 UTC--2017-12-31 UTC  0
2   0   2018-08-15 UTC--2018-09-19 UTC  1
2   0   2018-10-01 UTC--2018-10-21 UTC  1

这是,我想将标志= 0的时间间隔与每个组中所有可能的标志= 1进行比较,以查看标志0和标志1之间是否存在至少一个时间重叠

为此,我尝试使用lubridate int_overlaps函数

我已经尝试了以下代码,但是不起作用:

result <- df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0 & int_overlaps(time, any(time[flag == 1])), 1, 0))

我发现了一种非常相似的方法:

R: Determine if each date interval overlaps with all other date intervals in a dataframe

r group-by dplyr lubridate
2个回答
1
投票

您可以使用map_int中的purrr来查看any间隔是否在每个id中重叠:

library(tidyverse)
library(lubridate)

df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0, map_int(time, ~ any(int_overlaps(.x, time[flag == 1]))), NA))

输出

# A tibble: 11 x 4
# Groups:   id [2]
      id  flag time                           value
   <dbl> <dbl> <Interval>                     <int>
 1     1     1 2017-01-01 UTC--2017-01-07 UTC    NA
 2     1     0 2018-01-01 UTC--2019-01-01 UTC     0
 3     1     0 2017-01-03 UTC--2017-01-09 UTC     1
 4     2     1 2017-01-01 UTC--2017-01-15 UTC    NA
 5     2     1 2018-07-01 UTC--2018-09-01 UTC    NA
 6     2     1 2018-10-12 UTC--2018-10-20 UTC    NA
 7     2     0 2017-01-12 UTC--2017-01-16 UTC     1
 8     2     0 2017-03-01 UTC--2017-03-15 UTC     0
 9     2     0 2017-12-01 UTC--2017-12-31 UTC     0
10     2     0 2018-08-15 UTC--2018-09-19 UTC     1
11     2     0 2018-10-01 UTC--2018-10-21 UTC     1

0
投票

我添加从此处提取的另一个答案:

R: Determine if each date interval overlaps with all other date intervals in a dataframe

result <- df %>% group_by(id) %>%
  mutate(value = map(seq_along(time), function(x){
         y = setdiff(seq_along(time[flag == 1]), x)
          return(any(int_overlaps(time[x], time[y])))
            }))
© www.soinside.com 2019 - 2024. All rights reserved.