具有时间间隔重叠的组记录

问题描述 投票:0回答:1

我有一个数据帧(N = 16),包含ID(字符),w_from(日期)和w_to(日期)。每条记录代表一个任务。

这是R中的数据。

ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)

w_from <- c("2010-01-01","2010-01-05","2010-01-29","2010-01-29",
            "2010-03-01","2010-03-15","2010-07-15","2010-09-10",
            "2010-11-01","2010-11-30","2010-12-15","2010-12-31",
            "2011-02-01","2012-04-01","2011-07-01","2011-07-01")

w_to <- c("2010-01-31","2010-01-15", "2010-02-13","2010-02-28",
          "2010-03-16","2010-03-16","2010-08-14","2010-10-10",
          "2010-12-01","2010-12-30","2010-12-20","2011-02-19",
          "2011-03-23","2012-06-30","2011-07-31","2011-07-06")

df <- data.frame(ID, w_from, w_to)
df$w_from <- as.Date(df$w_from)
df$w_to <- as.Date(df$w_to)

我需要为其时间间隔重叠的记录按ID生成组号。作为一个示例,一般来说,如果记录#1与记录#2重叠,并且记录#2与记录#3重叠,则记录#1,记录#2和记录#3重叠。

此外,如果记录#1与记录#2和记录#3重叠,但是记录#2与记录#3不重叠,则记录#1,记录#2,记录#3都重叠。

在上面的示例中,对于ID = 1,前四个记录重叠。

enter image description here

这是最终输出:

enter image description here

此外,如果可以使用dplyr完成,那就太好了!

r
1个回答
1
投票

尝试一下:

library(dplyr)
df %>% 
  group_by(ID) %>%
  arrange(w_from) %>% 
  mutate(group = 1+cumsum(
    cummax(lag(as.numeric(w_to), default = first(as.numeric(w_to)))) < as.numeric(w_from)))

# A tibble: 16 x 4
# Groups:   ID [2]
      ID w_from     w_to       group
   <dbl> <date>     <date>     <dbl>
 1     1 2010-01-01 2010-01-31     1
 2     1 2010-01-05 2010-01-15     1
 3     1 2010-01-29 2010-02-13     1
 4     1 2010-01-29 2010-02-28     1
 5     1 2010-03-01 2010-03-16     2
 6     1 2010-03-15 2010-03-16     2
 7     1 2010-07-15 2010-08-14     3
 8     1 2010-09-10 2010-10-10     4
 9     1 2010-11-01 2010-12-01     5
10     1 2010-11-30 2010-12-30     5
11     1 2010-12-15 2010-12-20     5
12     1 2010-12-31 2011-02-19     6
13     1 2011-02-01 2011-03-23     6
14     2 2011-07-01 2011-07-31     1
15     2 2011-07-01 2011-07-06     1
16     2 2012-04-01 2012-06-30     2
© www.soinside.com 2019 - 2024. All rights reserved.