dplyr ::完成/填充时间序列,但仅在有限的时间段内

问题描述 投票:1回答:1

[我正在尝试使用dplyr :: completefill来填补动物体重的时间序列中的空白(大部分时间大约每周一次),但我只想在一定范围内这样做。

在以下示例数据集中,缺少几个日期:3月/ 4月的一次权重为2020年1月29日,以及连续4周的缺失。我们可以减少1周的称重(例如,在1/29上),并且可以“减轻”原始重量达2周,但是您不希望超出此范围。第二组丢失的数据应仅再填充13天,然后其余的空白应为wt_g的NA。

library(tidyverse)
library(lubridate)

animalwts <- tibble::tribble(
      ~Animal,     ~WtDate, ~Wt_g,
      "A",  "1/1/2020",   20L,
      "A",  "1/8/2020",   21L,
      "A", "1/15/2020",   21L,
      "A", "1/22/2020",   23L,
      "A",  "2/5/2020",   25L,
      "A", "2/12/2020",   23L,
      "A", "2/19/2020",   24L,
      "A", "2/26/2020",   23L,
      "A",  "3/4/2020",   22L,
      "A",  "4/8/2020",   24L
    ) %>%
        mutate(WtDate = mdy(WtDate))

以下代码可完成日期序列并填写all缺少的数据

animalwts %>%
  group_by(Animal) %>%
  complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
  fill(Wt_g) 

但是我想弄清楚如何complete所有日期,但是从任何给定日期起最多fill仅加权两周,并为所有进一步的缺失数据放入NA。

如果可能,我想留在管道中。

r dplyr time-series fill complete
1个回答
1
投票

喜欢吗?

library(tidyverse)
library(lubridate)

animalwts %>%
  group_by(Animal) %>%
  mutate(NA_lag = WtDate - lag(WtDate),
         last_measurement_date = WtDate) %>% 
  complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
  fill(Wt_g) %>% 
  fill(last_measurement_date) %>% 
  group_by(last_measurement_date, NA_lag) %>% 
  mutate(days_missing = row_number()) %>% 
  mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))

数据

animalwts <- tibble::tribble(
  ~Animal,     ~WtDate, ~Wt_g,
  "A",  "1/1/2020",   20L,
  "A",  "1/8/2020",   21L,
  "A", "1/15/2020",   21L,
  "A", "1/22/2020",   23L,
  "A",  "2/5/2020",   25L,
  "A", "2/12/2020",   23L,
  "A", "2/19/2020",   24L,
  "A", "2/26/2020",   23L,
  "A",  "3/4/2020",   22L,
  "A",  "4/8/2020",   24L
) %>%
  mutate(WtDate = mdy(WtDate))
© www.soinside.com 2019 - 2024. All rights reserved.