目标是按组计算加权平均值,窗口有 3 行,按最新行的顺序权重为 3、2、1。这与here的问题类似,但权重不是由列给出的。另外,我真的很想使用
frollsum()
,因为我正在处理大量数据并且需要它具有高性能。
我有一个使用
frollapply()
的解决方案:
library(data.table)
# Your data
set.seed(1)
DT <- data.table(group = rep(c(1, 2), each = 10), value = round(runif(n = 20, 1, 5)))
weights <- 1:3
k <- 3
weighted_average <- function(x) {
sum(x * weights[1:length(x)]) / sum(weights[1:length(x)])
}
# Apply rolling weighted average
DT[, wtavg := shift(frollapply(value, k, weighted_average, align = "right", fill = NA)),
by = group]
DT
#> group value wtavg
#> 1: 1 2 NA
#> 2: 1 2 NA
#> 3: 1 3 NA
#> 4: 1 5 2.500000
#> 5: 1 2 3.833333
#> 6: 1 5 3.166667
#> 7: 1 5 4.000000
#> 8: 1 4 4.500000
#> 9: 1 4 4.500000
#> 10: 1 1 4.166667
#> 11: 2 2 NA
#> 12: 2 2 NA
#> 13: 2 4 NA
#> 14: 2 3 3.000000
#> 15: 2 4 3.166667
#> 16: 2 3 3.666667
#> 17: 2 4 3.333333
#> 18: 2 5 3.666667
#> 19: 2 3 4.333333
#> 20: 2 4 3.833333
创建于 2023-11-27,使用 reprex v2.0.2
只需使用三次 frollsum 即可显着提高速度:
shift((frollsum(value, 3) + frollsum(value, 2) + frollsum(value, 1)) / 6)
基准测试
set.seed(1)
n = 1000000
groups = 1:1000
DT <- data.table(group = rep(groups, each = n/length(groups)), value = round(runif(n = n, 1, 5)))
bench::mark(
A = {
DT[, shift(frollapply(value, k, weighted_average, align = "right", fill = NA)),
by = group]
},
B = {
DT[, shift((frollsum(value, 3) + frollsum(value, 2) + frollsum(value, 1)) / 6),
by = group]
}
)
# # A tibble: 2 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
# 1 A 1.68s 1.68s 0.595 46.7MB 29.8 1 50 1.68s
# 2 B 112.11ms 119.48ms 7.67 100.8MB 15.3 4 8 521.21ms
# # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>