计算 R 中开始的天数

问题描述 投票:0回答:2

我正在寻找一种方法来计算参与者 (

id
) 在研究中花费的天数。

示例数据文件如下所示:

data <- data.frame(date = as.Date(c("2020-11-29", "2020-11-30", "2020-12-02", 
                                    "2020-12-04", "2020-12-05", "2020-12-08",
                                    "2020-11-22", "2020-11-21", "2020-11-24", 
                                    "2020-11-25", "2020-11-30", "2020-11-29",
                                    "2021-01-29", "2021-01-20", "2021-01-30", 
                                    "2021-02-01", "2021-02-04", "2021-02-04")),
                   id = rep(1:3, each = 6))

data <- dplyr::arrange(data, id, date)

data

         date id
1  2020-11-29  1
2  2020-11-30  1
3  2020-12-02  1
4  2020-12-04  1
5  2020-12-05  1
6  2020-12-08  1
7  2020-11-21  2
8  2020-11-22  2
9  2020-11-24  2
10 2020-11-25  2
11 2020-11-29  2
12 2020-11-30  2
13 2021-01-20  3
14 2021-01-29  3
15 2021-01-30  3
16 2021-02-01  3
17 2021-02-04  3
18 2021-02-04  3

我想要的是新专栏

days_from_start
,它将在每个
id
的第一天并将其设置为
0
。然后它将计算每个
id
内每隔一行的天数。像这样的东西:

data$days_from_start <- c(0, 1, 3, 4, 5, 8,
                          0, 1, 3, 4, 8, 10, 
                          0, 9, 10, 11, 14, 14)

data

         date id days_from_start
1  2020-11-29  1               0
2  2020-11-30  1               1
3  2020-12-02  1               3
4  2020-12-04  1               4
5  2020-12-05  1               5
6  2020-12-08  1               8
7  2020-11-21  2               0
8  2020-11-22  2               1
9  2020-11-24  2               3
10 2020-11-25  2               4
11 2020-11-29  2               8
12 2020-11-30  2              10
13 2021-01-20  3               0
14 2021-01-29  3               9
15 2021-01-30  3              10
16 2021-02-01  3              11
17 2021-02-04  3              14
18 2021-02-04  3              14

有什么想法吗?

谢谢你

r date dplyr
2个回答
3
投票

简单地对数据进行分组,计算出每个

date
的最早的
id
,然后计算差异。

data <- dplyr::arrange(data, id, date)
 data %>%
   group_by(id) %>% 
   mutate(
     start_date=min(date),
     days_from_start=as.numeric(date-start_date)
   ) %>% 
   ungroup() %>% 
   select(-start_date)
# A tibble: 18 x 3
   date          id days_from_start
   <date>     <int>           <dbl>
 1 2020-11-29     1               0
 2 2020-11-30     1               1
 3 2020-12-02     1               3
 4 2020-12-04     1               5
 5 2020-12-05     1               6
 6 2020-12-08     1               9
 7 2020-11-21     2               0
 8 2020-11-22     2               1
 9 2020-11-24     2               3
10 2020-11-25     2               4
11 2020-11-29     2               8
12 2020-11-30     2               9
13 2021-01-20     3               0
14 2021-01-29     3               9
15 2021-01-30     3              10
16 2021-02-01     3              12
17 2021-02-04     3              15
18 2021-02-04     3              15

0
投票

感谢上述解决方案,我需要添加什么代码才能将“days_from_start”变量添加到我的数据集中?

© www.soinside.com 2019 - 2024. All rights reserved.