我正在处理R数据帧
带列GROUP_COL | TIME| VALUE
。时间是有序的,值是数字,而col是我要对数据进行分组的分类变量。我的目标是
GROUP_COL
变量组成的第一组TIME
排序value = 0.1 * previous_value + 0.9 * value
计算每个组中值的加权平均值。如果没有先前的值,请保留该值不变。WEIGHTED
中。到目前为止,我尝试的是:Usng`dplyr,我使用lag()创建了一个先前值的向量
weighted_avg_with_previous <- function(.data, lag_weight=0.1) {
# get previous values
lag_val <- lag(.data$VALUE, n = 1L, default = 0, order_by = .data$TIME)
# give each value a weight 0.9 for current value and 0.1 for previous value
weighted = (1 -lag_weight) * .data$VALUE + lag_weight * lag_val
return (weighted)
}
data <- data %>%
group_by(SALES_RESPONSIBILITY, PRODUCT_AREA, CURRENCY, FORECAST_TYPE) %>%
arrange(HORIZON, .by_group=TRUE) %>%
mutate(WEIGHTED_VALUE = weighted_avg_with_previous(0.1))
但是,mutate
语句引发错误。如何使我的weighted_avg_with_previous
函数在单个组上运行?
示例:
GROUP | TIME| VALUE | WEIGHTED VALUE
_____________________________________
A | 1 | 1 | 1
A | 2 | 2 | 1.9
A | 3 | 3 | 2.9
A | 4 | 4 | 3.9
B | 1 | 3 | 3
B | 2 | 7 | 6.6
B | 3 | -4 | -3.3
...
最好,朱莉娅
library(tidyverse)
df <- structure(list(GROUP = c("A", "A", "A", "A", "B", "B", "B"),
TIME = c(1L, 2L, 3L, 4L, 1L, 2L, 3L), VALUE = c(1L, 2L, 3L,
4L, 3L, 7L, -4L)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
df %>%
group_by(GROUP) %>%
mutate(previous.value = lag(VALUE)) %>%
mutate(weighted.value = ifelse(is.na(previous.value),VALUE, 0.1*previous.value + 0.9*VALUE)) %>%
select(-previous.value)
第一个mutate()
语句为滞后的value
创建一个新变量,第二个语句创建weighted.value
,该变量等于0.1*previous.value + 0.9*value
,或者如果value
为空,则等于previous.value
。
输出:
# A tibble: 7 x 4
# Groups: GROUP [2]
GROUP TIME VALUE weighted.value
<chr> <int> <int> <dbl>
1 A 1 1 1
2 A 2 2 1.9
3 A 3 3 2.9
4 A 4 4 3.9
5 B 1 3 3
6 B 2 7 6.6
7 B 3 -4 -2.9