使用 dplyr 在两个数据帧的日期范围内进行主题级过滤

Question

我希望根据另一个数据帧 (

Target

) 中的值是否落在

Reference

数据帧中的两个日期之间来过滤一个数据帧 (

Target

) 中的数据。

例如，这将是我的原始数据（

Target

）：

参与者ID	日期1	日期2
10001	2010年1月2日	2015年1月2日
10001	2016年3月2日	2018年1月2日
10001	2019年1月2日	2020年1月2日
10001	2021 年 1 月 2 日	2023 年 1 月 2 日
10002	2016年1月2日	2018年1月2日
10002	2019年1月2日	2020年1月2日
10002	2021 年 1 月 2 日	2023 年 1 月 2 日
10003	2013年1月2日	2020年1月2日
10003	2021 年 1 月 2 日	2023 年 1 月 2 日

这将是我的参考数据(

Reference

):

参与者ID	日期A
10001	2013 年 3 月 12 日
10002	2022 年 5 月 15 日
10003	2022 年 9 月 20 日

我想要的是

Target

的过滤输出，其中

DateA

中的

Reference

落在

Date1

和

Date2

中的

Target

之间，如下所示：

参与者ID	日期1	日期2
10001	2010年1月2日	2015年1月2日
10002	2021 年 1 月 2 日	2023 年 1 月 2 日
10003	2021 年 1 月 2 日	2023 年 1 月 2 日

如果有人能够提供一些关于如何使用

dplyr

和管道来完成此操作的信息，我将不胜感激。

制作数据帧的代码可以在下面找到，但您可能需要加载

lubridate

库。

Reference <- structure(
  list(
    ParticipantId = 10001:10003,
    DateA = c("3/12/2013", "5/15/2022", "9/20/2022")
  ),
  class = "data.frame",
  row.names = c(NA, -3L)
)

Target <- structure(
  list(
    ParticipantId = c(
      10001L,
      10001L,
      10001L,
      10001L,
      10002L,
      10002L,
      10002L,
      10003L,
      10003L
    ),
    Date1 = c(
      "1/2/2010",
      "1/2/2016",
      "1/2/2019",
      "1/2/2021",
      "1/2/2016",
      "1/2/2019",
      "1/2/2021",
      "1/2/2019",
      "1/2/2021"
    ),
    Date2 = c(
      "1/2/2015",
      "1/2/2018",
      "1/2/2020",
      "1/2/2023",
      "1/2/2018",
      "1/2/2020",
      "1/2/2023",
      "1/2/2020",
      "1/2/2023"
    )
  ),
  class = "data.frame",

Answer 1

首先，您需要将日期列从字符串转换为日期。由于它们采用 m-d-YYYY 格式，因此您可以使用

lubridate::mdy

这样做。

library(dplyr)
library(lubridate)
Reference <- structure(
  list(
    ParticipantId = 10001:10003,
    DateA = c("3/12/2013", "5/15/2022", "9/20/2022")
  ),
  class = "data.frame",
  row.names = c(NA, -3L)
)

Target <- data.frame(
    ParticipantId = c(
      10001L,
      10001L,
      10001L,
      10001L,
      10002L,
      10002L,
      10002L,
      10003L,
      10003L
    ),
    Date1 = c(
      "1/2/2010",
      "1/2/2016",
      "1/2/2019",
      "1/2/2021",
      "1/2/2016",
      "1/2/2019",
      "1/2/2021",
      "1/2/2019",
      "1/2/2021"
    ),
    Date2 = c(
      "1/2/2015",
      "1/2/2018",
      "1/2/2020",
      "1/2/2023",
      "1/2/2018",
      "1/2/2020",
      "1/2/2023",
      "1/2/2020",
      "1/2/2023"
    )
  )

Target$Date1 <- mdy(Target$Date1)
Target$Date2 <- mdy(Target$Date2)
Reference$DateA <- mdy(Reference$DateA)

接下来，加入数据，以便您拥有与目标数据中的每个 Particant_ID 相对应的 DataA。

joined_data <- left_join(Target, Reference, by = join_by(ParticipantId)) 
print(joined_data)
  ParticipantId      Date1      Date2      DateA
1         10001 2010-01-02 2015-01-02 2013-03-12
2         10001 2016-01-02 2018-01-02 2013-03-12
3         10001 2019-01-02 2020-01-02 2013-03-12
4         10001 2021-01-02 2023-01-02 2013-03-12
5         10002 2016-01-02 2018-01-02 2022-05-15
6         10002 2019-01-02 2020-01-02 2022-05-15
7         10002 2021-01-02 2023-01-02 2022-05-15
8         10003 2019-01-02 2020-01-02 2022-09-20
9         10003 2021-01-02 2023-01-02 2022-09-20

最后，使用 dplyr::filter 和 dplyr:: Between 只保留 DateA 介于 Date1 和 Date2 之间的记录

joined_data_between_dates_1_and_2 <- filter(joined_data, between(DateA, Date1, Date2)) 
print(joined_data_between_dates_1_and_2)
 ParticipantId      Date1      Date2      DateA
1         10001 2010-01-02 2015-01-02 2013-03-12
2         10002 2021-01-02 2023-01-02 2022-05-15
3         10003 2021-01-02 2023-01-02 2022-09-20

使用 dplyr 在两个数据帧的日期范围内进行主题级过滤

问题描述投票：0回答：1

1个回答

最新问题

使用 dplyr 在两个数据帧的日期范围内进行主题级过滤

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1