行数 | Patient_ID | 开始日期 | 开始窗口 |
---|---|---|---|
1 | 001 | 2023-04-01 | 2023-04-01 |
2 | 001 | 2023-04-03 | NA |
3 | 001 | 2023-04-05 | NA |
4 | 001 | 2023-04-06 | NA |
5 | 001 | 2023-04-08 | NA |
6 | 001 | 2023-04-09 | 2023-04-09 |
7 | 001 | 2023-04-11 | 2023-04-11 |
8 | 001 | 2023-04-13 | NA |
9 | 001 | 2023-04-16 | NA |
10 | 001 | 2023-04-18 | 2023-04-18 |
11 | 002 | 2023-04-02 | 2023-04-02 |
12 | 002 | 2023-04-04 | 2023-04-04 |
13 | 002 | 2023-04-07 | 2023-04-07 |
14 | 002 | 2023-04-08 | 2023-04-08 |
15 | 002 | 2023-04-10 | NA |
# Create the data frame
df <- data.frame(
Row_Num = c(1:15),
Patient_ID = c(rep(001, 10), rep(002, 5)),
Start_Date = c("2023-04-01", "2023-04-03", "2023-04-05", "2023-04-06", "2023-04-08",
"2023-04-09", "2023-04-11", "2023-04-13", "2023-04-16", "2023-04-18",
"2023-04-02", "2023-04-04", "2023-04-07", "2023-04-08", "2023-04-10"),
Start_Window = c("2023-04-01", NA, NA, NA, NA, "2023-04-09", "2023-04-11", NA, NA, "2023-04-18",
"2023-04-02", "2023-04-04", "2023-04-07", "2023-04-08", NA)
)
# Print the table
print(df)
我有上面的表格,它是使用上面可重现的 R 代码创建的。
我想用“开始日期”的值替换开始窗口中的“NA”。 但是,当每个患者 ID 的 NA 值是连续的时,我想将第一个值向后传递。
所以在这个例子中,第 1-5 行的 start_window 应该是“2023-04-08”。 在第 8-9 行中,start_window 应为“2023-04-16”。 第 15 行应为“2023-04-10”。
您可以根据
Patient_ID
和Start_Window
的运行长度创建分组ID,并且对于NA
s取该组的Start_Date
的最后一个值。
library(dplyr)
df %>%
mutate(temp_id = consecutive_id(Start_Window, Patient_ID)) %>%
mutate(Start_Window = if_else(is.na(Start_Window), last(Start_Date), Start_Window),
temp_id = NULL, .by = temp_id)
Row_Num Patient_ID Start_Date Start_Window
1 1 1 2023-04-01 2023-04-01
2 2 1 2023-04-03 2023-04-08
3 3 1 2023-04-05 2023-04-08
4 4 1 2023-04-06 2023-04-08
5 5 1 2023-04-08 2023-04-08
6 6 1 2023-04-09 2023-04-09
7 7 1 2023-04-11 2023-04-11
8 8 1 2023-04-13 2023-04-16
9 9 1 2023-04-16 2023-04-16
10 10 1 2023-04-18 2023-04-18
11 11 2 2023-04-02 2023-04-02
12 12 2 2023-04-04 2023-04-04
13 13 2 2023-04-07 2023-04-07
14 14 2 2023-04-08 2023-04-08
15 15 2 2023-04-10 2023-04-10