我有一个数据帧(N = 16),包含ID(字符),w_from(日期)和w_to(日期)。每条记录代表一个任务。
这是R中的数据。
ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)
w_from <- c("2010-01-01","2010-01-05","2010-01-29","2010-01-29",
"2010-03-01","2010-03-15","2010-07-15","2010-09-10",
"2010-11-01","2010-11-30","2010-12-15","2010-12-31",
"2011-02-01","2012-04-01","2011-07-01","2011-07-01")
w_to <- c("2010-01-31","2010-01-15", "2010-02-13","2010-02-28",
"2010-03-16","2010-03-16","2010-08-14","2010-10-10",
"2010-12-01","2010-12-30","2010-12-20","2011-02-19",
"2011-03-23","2012-06-30","2011-07-31","2011-07-06")
df <- data.frame(ID, w_from, w_to)
df$w_from <- as.Date(df$w_from)
df$w_to <- as.Date(df$w_to)
我需要为其时间间隔重叠的记录按ID生成组号。作为一个示例,一般来说,如果记录#1与记录#2重叠,并且记录#2与记录#3重叠,则记录#1,记录#2和记录#3重叠。
此外,如果记录#1与记录#2和记录#3重叠,但是记录#2与记录#3不重叠,则记录#1,记录#2,记录#3都重叠。
在上面的示例中,对于ID = 1,前四个记录重叠。
这是最终输出:
此外,如果可以使用dplyr完成,那就太好了!
尝试一下:
library(dplyr)
df %>%
group_by(ID) %>%
arrange(w_from) %>%
mutate(group = 1+cumsum(
cummax(lag(as.numeric(w_to), default = first(as.numeric(w_to)))) < as.numeric(w_from)))
# A tibble: 16 x 4
# Groups: ID [2]
ID w_from w_to group
<dbl> <date> <date> <dbl>
1 1 2010-01-01 2010-01-31 1
2 1 2010-01-05 2010-01-15 1
3 1 2010-01-29 2010-02-13 1
4 1 2010-01-29 2010-02-28 1
5 1 2010-03-01 2010-03-16 2
6 1 2010-03-15 2010-03-16 2
7 1 2010-07-15 2010-08-14 3
8 1 2010-09-10 2010-10-10 4
9 1 2010-11-01 2010-12-01 5
10 1 2010-11-30 2010-12-30 5
11 1 2010-12-15 2010-12-20 5
12 1 2010-12-31 2011-02-19 6
13 1 2011-02-01 2011-03-23 6
14 2 2011-07-01 2011-07-31 1
15 2 2011-07-01 2011-07-06 1
16 2 2012-04-01 2012-06-30 2