我有一个像这样的数据框
date values
1 2018-09-11 2.049659
2 2018-09-19 2.537952
3 2018-09-24 1.591455
4 2018-10-01 1.012579
5 2018-10-08 1.382486
6 2018-10-15 2.533606
...
193 2023-12-19 1.262344
我希望计算过去(大约)365 天的
values
的累积平均值。我遇到的问题是条目的间距不均匀;条目之间的天数范围为 2 至 35 天。我脑子里的方法是以某种方式确定每个条目最接近 365(+/-) 天的日期,并计算该范围内所有 values
的平均值,但我正在努力实现这种方法。
例如,如果日期是2023-12-19,并且有日期2022-12-21和2022-12-14,我希望在2023-12-19和2022-12-21之间计算平均值.
所需的输出数据框将简单地向现有数据框添加一列,并有一个新列
Annual Rolling Mean
,其中 NA 从 2018 年 9 月 11 日起第一个日期 ~ 365 天,然后是连续平均值到 2023 年 12 月 19 日最后一个values
。我得到的最接近的是使用date
,但这显然不是处理非连续时间序列数据的正确方法,因为它不使用日期,只使用行数。
可重现的数据slider::slide_dbl
library(tidyverse)
library(slider)
dat <- data.frame(date = as.Date(
c("2018-09-11", "2018-09-19", "2018-09-24", "2018-10-01", "2018-10-08",
"2018-10-15", "2018-10-23", "2018-10-31", "2018-11-05", "2018-11-13",
"2018-11-19", "2018-11-21", "2018-11-28", "2018-12-04", "2018-12-11",
"2018-12-18", "2018-12-26", "2019-01-02", "2019-01-08", "2019-01-15",
"2019-01-22", "2019-01-29", "2019-02-05", "2019-02-12", "2019-02-19",
"2019-02-26", "2019-03-05", "2019-03-12", "2019-03-19", "2019-03-26",
"2019-04-02", "2019-04-09", "2019-04-16", "2019-04-23", "2019-04-30",
"2019-05-07", "2019-05-14", "2019-05-21", "2019-05-28", "2019-06-04",
"2019-06-11", "2019-06-18", "2019-06-25", "2019-07-02", "2019-07-09",
"2019-07-16", "2019-07-30", "2019-08-13", "2019-08-27", "2019-09-10",
"2019-09-24", "2019-10-02", "2019-10-08", "2019-10-22", "2019-11-05",
"2019-11-19", "2019-12-03", "2019-12-17", "2019-12-31", "2020-01-14",
"2020-01-28", "2020-02-11", "2020-02-25", "2020-03-10", "2020-04-07",
"2020-05-05", "2020-06-02", "2020-07-07", "2020-08-05", "2020-08-18",
"2020-09-01", "2020-09-15", "2020-09-29", "2020-10-06", "2020-10-13",
"2020-10-20", "2020-10-27", "2020-11-10", "2020-11-24", "2020-12-08",
"2020-12-22", "2021-01-05", "2021-01-19", "2021-02-02", "2021-02-16",
"2021-03-02", "2021-03-16", "2021-03-30", "2021-04-13", "2021-04-27",
"2021-04-29", "2021-05-11", "2021-05-25", "2021-06-08", "2021-06-22",
"2021-07-07", "2021-07-20", "2021-08-03", "2021-08-17", "2021-08-31",
"2021-09-14", "2021-09-28", "2021-10-12", "2021-10-13", "2021-10-19",
"2021-10-20", "2021-10-26", "2021-11-09", "2021-11-23", "2021-12-07",
"2021-12-21", "2022-01-04", "2022-01-19", "2022-02-01", "2022-02-15",
"2022-03-01", "2022-03-15", "2022-03-29", "2022-04-12", "2022-04-26",
"2022-05-10", "2022-05-24", "2022-06-07", "2022-06-21", "2022-07-06",
"2022-07-19", "2022-08-02", "2022-08-16", "2022-08-30", "2022-09-13",
"2022-10-11", "2022-10-24", "2022-11-01", "2022-11-08", "2022-11-15",
"2022-11-22", "2022-11-29", "2022-12-06", "2022-12-13", "2022-12-20",
"2022-12-29", "2023-01-04", "2023-01-10", "2023-01-18", "2023-01-24",
"2023-01-31", "2023-02-07", "2023-02-14", "2023-02-21", "2023-02-28",
"2023-03-07", "2023-03-14", "2023-03-21", "2023-03-28", "2023-04-04",
"2023-04-11", "2023-04-18", "2023-04-25", "2023-05-02", "2023-05-09",
"2023-05-16", "2023-05-23", "2023-05-31", "2023-06-06", "2023-06-13",
"2023-06-20", "2023-06-27", "2023-07-05", "2023-07-11", "2023-07-18",
"2023-07-25", "2023-08-01", "2023-08-08", "2023-08-15", "2023-08-22",
"2023-08-29", "2023-09-05", "2023-09-12", "2023-09-19", "2023-09-26",
"2023-09-26", "2023-10-03", "2023-10-10", "2023-10-17", "2023-10-24",
"2023-10-31", "2023-11-07", "2023-11-14", "2023-11-21", "2023-11-28",
"2023-12-05", "2023-12-12", "2023-12-19")),
values = runif(193, min = 0, max = 3))
## there is sometimes NA values
dat[75, 2] <- NA
## my closest attempt
rolling_mean <- dat %>%
mutate(year = lubridate::year(date),
days_since_last = date - lag(date, n = 1 )) %>%
group_by(year, isna = is.na(values)) %>%
mutate(`Annual Rolling Mean` = round(slider::slide_dbl(values, mean, .before = 52), 2)) %>%
ungroup() %>%
fill(`Annual Rolling Mean`)