使用非连续时间序列数据计算过去~365天的滚动平均值

问题描述 投票:0回答:1

我有一个像这样的数据框

        date   values
1 2018-09-11 2.049659
2 2018-09-19 2.537952
3 2018-09-24 1.591455
4 2018-10-01 1.012579
5 2018-10-08 1.382486
6 2018-10-15 2.533606
...
193 2023-12-19 1.262344

我希望计算过去(大约)365 天的

values
的累积平均值。我遇到的问题是条目的间距不均匀;条目之间的天数范围为 2 至 35 天。我脑子里的方法是以某种方式确定每个条目最接近 365(+/-) 天的日期,并计算该范围内所有
values
的平均值,但我正在努力实现这种方法。

例如,如果日期是2023-12-19,并且有日期2022-12-21和2022-12-14,我希望在2023-12-19和2022-12-21之间计算平均值.

所需的输出数据框将简单地向现有数据框添加一列,并有一个新列

Annual Rolling Mean
,其中 NA 从 2018 年 9 月 11 日起第一个日期 ~ 365 天,然后是连续平均值到 2023 年 12 月 19 日最后一个
values
我得到的最接近的是使用

date

,但这显然不是处理非连续时间序列数据的正确方法,因为它不使用日期,只使用行数。

可重现的数据

slider::slide_dbl


r dplyr tidyverse lubridate rolling-computation
1个回答
0
投票

library(tidyverse) library(slider) dat <- data.frame(date = as.Date( c("2018-09-11", "2018-09-19", "2018-09-24", "2018-10-01", "2018-10-08", "2018-10-15", "2018-10-23", "2018-10-31", "2018-11-05", "2018-11-13", "2018-11-19", "2018-11-21", "2018-11-28", "2018-12-04", "2018-12-11", "2018-12-18", "2018-12-26", "2019-01-02", "2019-01-08", "2019-01-15", "2019-01-22", "2019-01-29", "2019-02-05", "2019-02-12", "2019-02-19", "2019-02-26", "2019-03-05", "2019-03-12", "2019-03-19", "2019-03-26", "2019-04-02", "2019-04-09", "2019-04-16", "2019-04-23", "2019-04-30", "2019-05-07", "2019-05-14", "2019-05-21", "2019-05-28", "2019-06-04", "2019-06-11", "2019-06-18", "2019-06-25", "2019-07-02", "2019-07-09", "2019-07-16", "2019-07-30", "2019-08-13", "2019-08-27", "2019-09-10", "2019-09-24", "2019-10-02", "2019-10-08", "2019-10-22", "2019-11-05", "2019-11-19", "2019-12-03", "2019-12-17", "2019-12-31", "2020-01-14", "2020-01-28", "2020-02-11", "2020-02-25", "2020-03-10", "2020-04-07", "2020-05-05", "2020-06-02", "2020-07-07", "2020-08-05", "2020-08-18", "2020-09-01", "2020-09-15", "2020-09-29", "2020-10-06", "2020-10-13", "2020-10-20", "2020-10-27", "2020-11-10", "2020-11-24", "2020-12-08", "2020-12-22", "2021-01-05", "2021-01-19", "2021-02-02", "2021-02-16", "2021-03-02", "2021-03-16", "2021-03-30", "2021-04-13", "2021-04-27", "2021-04-29", "2021-05-11", "2021-05-25", "2021-06-08", "2021-06-22", "2021-07-07", "2021-07-20", "2021-08-03", "2021-08-17", "2021-08-31", "2021-09-14", "2021-09-28", "2021-10-12", "2021-10-13", "2021-10-19", "2021-10-20", "2021-10-26", "2021-11-09", "2021-11-23", "2021-12-07", "2021-12-21", "2022-01-04", "2022-01-19", "2022-02-01", "2022-02-15", "2022-03-01", "2022-03-15", "2022-03-29", "2022-04-12", "2022-04-26", "2022-05-10", "2022-05-24", "2022-06-07", "2022-06-21", "2022-07-06", "2022-07-19", "2022-08-02", "2022-08-16", "2022-08-30", "2022-09-13", "2022-10-11", "2022-10-24", "2022-11-01", "2022-11-08", "2022-11-15", "2022-11-22", "2022-11-29", "2022-12-06", "2022-12-13", "2022-12-20", "2022-12-29", "2023-01-04", "2023-01-10", "2023-01-18", "2023-01-24", "2023-01-31", "2023-02-07", "2023-02-14", "2023-02-21", "2023-02-28", "2023-03-07", "2023-03-14", "2023-03-21", "2023-03-28", "2023-04-04", "2023-04-11", "2023-04-18", "2023-04-25", "2023-05-02", "2023-05-09", "2023-05-16", "2023-05-23", "2023-05-31", "2023-06-06", "2023-06-13", "2023-06-20", "2023-06-27", "2023-07-05", "2023-07-11", "2023-07-18", "2023-07-25", "2023-08-01", "2023-08-08", "2023-08-15", "2023-08-22", "2023-08-29", "2023-09-05", "2023-09-12", "2023-09-19", "2023-09-26", "2023-09-26", "2023-10-03", "2023-10-10", "2023-10-17", "2023-10-24", "2023-10-31", "2023-11-07", "2023-11-14", "2023-11-21", "2023-11-28", "2023-12-05", "2023-12-12", "2023-12-19")), values = runif(193, min = 0, max = 3)) ## there is sometimes NA values dat[75, 2] <- NA ## my closest attempt rolling_mean <- dat %>% mutate(year = lubridate::year(date), days_since_last = date - lag(date, n = 1 )) %>% group_by(year, isna = is.na(values)) %>% mutate(`Annual Rolling Mean` = round(slider::slide_dbl(values, mean, .before = 52), 2)) %>% ungroup() %>% fill(`Annual Rolling Mean`)

© www.soinside.com 2019 - 2024. All rights reserved.