我有一个函数可以计算变量的修改后的 z 分数,如下所示:
calculate_modified_z_score <- function(x) {
median_x <- weighted.median(x, w = df$weight, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
我在代码中的数据帧上运行它,如下所示:
df %>% mutate(z_of_var = calculate_modified_z_score(var))
这有效。但是我想按组执行此操作,以便按
group_var
的每个级别取加权中位数。然而,这样做的问题是 weight
变量不再适用于此,因为它的长度不同。所以我收到错误:
df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))
Error in `mutate()`:
ℹ In argument: `z_of_var = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "1"`.
Caused by error in `weighted.quantile()`:
! length(x) == length(w) is not TRUE
我明白为什么这不起作用,但如果我在指定函数时不使用
df
,它就不起作用。如果我这样做:
calculate_modified_z_score <- function(x) {
median_x <- weighted.median(x, w = weight, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))
我收到错误:
Error in `mutate()`:
ℹ In argument: `z_po_mil = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "a"`.
Caused by error in `calculate_modified_z_score()`:
! object 'weight' not found
Backtrace:
1. ... %>% select(z_po_mil)
10. global calculate_modified_z_score(var)
11. spatstat.geom::weighted.median(x, w = vote, na.rm = TRUE)
13. spatstat.geom::weighted.quantile(...)
14. base::as.vector(w)
如何按组执行此功能并对组内的每个观察使用
weight
?
看来这招成功了!比我想象的简单得多。
calculate_modified_z_score <- function(x,y) {
median_x <- weighted.median(x, w = y, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
df %>% mutate(z_of_var = calculate_modified_z_score(var,weight))