R查询-可以同时使用“ sapply”和“ weighted.mean”功能吗?

问题描述 投票:1回答:1

我一直在使用代码来运行特定变量值(人口统计中断)的均值,但是现在我的数据具有权重变量,需要计算加权均值。我已经在使用代码来计算样本均值,并且想知道是否可以更改更改或调整函数以计算加权均值。这是一些用于生成样本数据的代码

df <- data.frame(gender=c(2,2,1,1,2,2,1,1,1,1,1,1,2,2,2,2,1,2,2,1),
                 agegroup=c(2,2,7,5,5,5,2,7,2,2,4,4,4,3,4,5,3,3,6,6),
                 attitude_1=c(4,3,4,4,4,4,4,4,5,2,5,5,5,4,3,2,3,4,2,4),
                 attitude_2=c(4,4,1,3,4,2,4,5,5,5,5,4,5,4,3,3,4,4,4,4),
                 attitude_3=c(2,2,1,1,3,2,5,1,4,2,2,2,3,3,4,1,4,1,3,1),
                 income=c(40794,74579,62809,47280,72056,57908,70784,96742,66629,117530,79547,54110,39569,111217,109146,56421,106206,28385,85830,71110),
                 weight=c(1.77,1.89,2.29,6.14,2.07,5.03,0.73,1.60,1.95,2.56,5.41,2.02,6.87,3.23,3.01,4.68,3.42,2.75,2.31,4.04))

到目前为止,我一直在使用此代码来获取样本均值

assign("Gender_Profile_1", 
       data.frame(sapply(subset(df, gender==1), FUN = function(x) mean(x, na.rm = TRUE))))

> Gender_Profile_1
           sapply.subset.df..gender....1...FUN...function.x..mean.x..na.rm...TRUE..
gender                                                                        1.000
agegroup                                                                      4.200
attitude_1                                                                    4.000
attitude_2                                                                    4.000
attitude_3                                                                    2.300
income                                                                    77274.700
weight                                                                        3.016

如您所见,它会生成Gender_Profile_1,其中包含所有变量的均值。为了计算加权平均值,我尝试将“ FUN =”部分更改为此

assign("Gender_Profile_1", 
       data.frame(sapply(subset(df, gender==1), FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))))

我收到以下错误消息

 Error in weighted.mean.default(x, w = weight, na.rm = TRUE) : 
  'x' and 'w' must have the same length 

我一直在尝试df $ weight和df $ x的各种排列,但似乎没有任何效果。任何帮助或想法都会很棒。非常感谢

r sapply weighted-average
1个回答
0
投票

如果要坚持以R为底,可以执行以下操作:

# define func to return all weighted means
all_wmeans <- function(data_subset) {

  # which cols to summarise? all but gender and weight
  summ_cols <- setdiff(names(data_subset), c('gender', 'weight'))

  # for each col, calc weighted mean with weights from the 'weight' column
  result <- lapply(data_subset[, summ_cols], 
                   weighted.mean, w=data_subset$weight)

  # squeeze the resuling list back to a data.frame and return
  return(data.frame(result))
}

# now, split the df on gender, and apply the func to each chunk
lapply(split(df, df$gender), all_wmeans)

结果是两个数据帧的列表,每个gender值:

$`1`
  agegroup attitude_1 attitude_2 attitude_3   income
1 4.397546   4.027851   3.950597   1.962202 74985.25

$`2`
  agegroup attitude_1 attitude_2 attitude_3   income
1 4.092234   3.642666   3.676287   2.388872 64075.23
© www.soinside.com 2019 - 2024. All rights reserved.