我有一个类似于下面的数据框 - 我的实际数据更大且分组 - 并且想知道如何将 NA 与最接近的非 NA 进行归一化以获得整数变量,就日期而言,距离观察不到 30 天- 之前或之后。当出现平局时,我想选择较早的日期而不是较晚的日期。我找到了this,但它并不能解释连续的 NA。
任何帮助将不胜感激!
df <- data.frame(
dates = c("2023-09-01", "2023-09-02", "2023-09-05", "2023-09-06", "2023-09-10",
"2023-09-11", "2023-09-14", "2023-09-16", "2023-09-20", "2023-09-27", "2023-09-28"),
x = c(10, NA, 20, NA, NA, 30, NA, NA, NA, 40, NA)
)
# desired output for the x column
x= c(10, 10, 20, 20, 30, 30, 30, 30, 40, 40, 40)
这是一个 tidyverse 解决方案:
library(tidyverse)
df <- df %>% mutate(dates = as.Date(dates)) # type cast from Q sample
ddd <- filter(df, !is.na(x)) %>% pull(dates) # vector of dates for which x'es are available
useDatediff <- function(d){ # find nearest date for which value is present
aftr <- min(ddd[ddd >= d] - d) # and return the days difference (+) after or (-) before
bfr <- min(d - ddd[ddd <= d])
if (bfr <= aftr) {
rslt <- -bfr
} else {
rslt <- aftr
}
if (rslt > 30) 0 else rslt
}
df %>%
rowwise() %>%
mutate(useDatediff = useDatediff(dates)
) %>%
mutate(useDate = dates + useDatediff) %>%
left_join(df, by = c("useDate" = "dates")) %>%
transmute(dates,
x = coalesce(x.x, x.y))
结果:
# A tibble: 11 × 2
# Rowwise:
dates x
<date> <dbl>
1 2023-09-01 10
2 2023-09-02 10
3 2023-09-05 20
4 2023-09-06 20
5 2023-09-10 30
6 2023-09-11 30
7 2023-09-14 30
8 2023-09-16 30
9 2023-09-20 40
10 2023-09-27 40
11 2023-09-28 40