按具有权重的组应用修改后的 z 得分函数

问题描述 投票:0回答:1

我有一个函数可以计算变量的修改后的 z 分数,如下所示:

calculate_modified_z_score <- function(x) {
  median_x <- weighted.median(x, w = df$weight, na.rm = TRUE)
  mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values

  if (mad_x == 0) {
    meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
    return((x - median_x) / (1.253314 * meanAD_x))
  } else {
    return((x - median_x) / (1.486 * mad_x))
  }
}

我在代码中的数据帧上运行它,如下所示:

df %>% mutate(z_of_var = calculate_modified_z_score(var))

这有效。但是我想按组执行此操作,以便按

group_var
的每个级别取加权中位数。然而,这样做的问题是
weight
变量不再适用于此,因为它的长度不同。所以我收到错误:

df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))

Error in `mutate()`:
ℹ In argument: `z_of_var = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "1"`.
Caused by error in `weighted.quantile()`:
! length(x) == length(w) is not TRUE

我明白为什么这不起作用,但如果我在指定函数时不使用

df
,它就不起作用。如果我这样做:


calculate_modified_z_score <- function(x) {
  median_x <- weighted.median(x, w = weight, na.rm = TRUE)
  mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values

  if (mad_x == 0) {
    meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
    return((x - median_x) / (1.253314 * meanAD_x))
  } else {
    return((x - median_x) / (1.486 * mad_x))
  }
}

df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))

我收到错误:


Error in `mutate()`:
ℹ In argument: `z_po_mil = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "a"`.
Caused by error in `calculate_modified_z_score()`:
! object 'weight' not found
Backtrace:
  1. ... %>% select(z_po_mil)
 10. global calculate_modified_z_score(var)
 11. spatstat.geom::weighted.median(x, w = vote, na.rm = TRUE)
 13. spatstat.geom::weighted.quantile(...)
 14. base::as.vector(w)

如何按组执行此功能并对组内的每个观察使用

weight

r weighted z-score
1个回答
0
投票

看来这招成功了!比我想象的简单得多。

calculate_modified_z_score <- function(x,y) {
  median_x <- weighted.median(x, w = y, na.rm = TRUE)
  mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values

  if (mad_x == 0) {
    meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
    return((x - median_x) / (1.253314 * meanAD_x))
  } else {
    return((x - median_x) / (1.486 * mad_x))
  }
}

df %>% mutate(z_of_var = calculate_modified_z_score(var,weight))


© www.soinside.com 2019 - 2024. All rights reserved.