将列中的 NA 替换为日期列中最接近的 NA,将非 NA 替换为 R 中的条件

问题描述 投票:0回答:1

我有一个类似于下面的数据框 - 我的实际数据更大且分组 - 并且想知道如何将 NA 与最接近的非 NA 进行归一化以获得整数变量,就日期而言,距离观察不到 30 天- 之前或之后。当出现平局时,我想选择较早的日期而不是较晚的日期。我找到了this,但它并不能解释连续的 NA。

任何帮助将不胜感激!

df <- data.frame(
  dates = c("2023-09-01", "2023-09-02", "2023-09-05", "2023-09-06", "2023-09-10",
            "2023-09-11", "2023-09-14", "2023-09-16", "2023-09-20", "2023-09-27", "2023-09-28"),
  x = c(10, NA, 20, NA, NA, 30, NA, NA, NA, 40, NA)
)

# desired output for the x column

x= c(10, 10, 20, 20, 30, 30, 30, 30, 40, 40, 40)
r missing-data imputation
1个回答
0
投票

这是一个 tidyverse 解决方案:

library(tidyverse)

df <- df %>% mutate(dates = as.Date(dates))  # type cast from Q sample

ddd <- filter(df, !is.na(x)) %>% pull(dates) # vector of dates for which x'es are available

useDatediff <- function(d){       # find nearest date for which value is present
  aftr <- min(ddd[ddd >= d] - d)  # and return the days difference (+) after or (-) before
  bfr <- min(d - ddd[ddd <= d])
  if (bfr <= aftr) {
    rslt <- -bfr
  } else {
    rslt <- aftr
  } 
  if (rslt > 30) 0 else rslt
}


df %>% 
  rowwise() %>% 
  mutate(useDatediff = useDatediff(dates)
  ) %>% 
  mutate(useDate = dates + useDatediff) %>% 
  left_join(df, by = c("useDate" = "dates")) %>% 
  transmute(dates, 
            x = coalesce(x.x, x.y))

结果:

# A tibble: 11 × 2
# Rowwise: 
   dates          x
   <date>     <dbl>
 1 2023-09-01    10
 2 2023-09-02    10
 3 2023-09-05    20
 4 2023-09-06    20
 5 2023-09-10    30
 6 2023-09-11    30
 7 2023-09-14    30
 8 2023-09-16    30
 9 2023-09-20    40
10 2023-09-27    40
11 2023-09-28    40
© www.soinside.com 2019 - 2024. All rights reserved.