在数据帧中的 R 中循环方差分析测试时出错

问题描述 投票:0回答:1

我有数据框:

sub3 <- df1[, c('Attrition', "Age", "DistanceFromHome", "MonthlyIncome", "NumCompaniesWorked",    "PercentSalaryHike", "TotalWorkingYears", "TrainingTimesLastYear", "YearsAtCompany", "YearsSinceLastPromotion", "YearsWithCurrManager")]
    sub3

其中 Attrition 是响应变量。

我正在尝试在 R 中运行一个方差分析测试循环来验证我的响应变量和分类变量之间的关系,我的代码是:

df_num <- function(x) {
  aov <- aov(as.numeric(sub3$Attrition) ~ sub3[, x], data = sub3)

  res <- data.frame('row' = 'Attrition'
                , 'column' = colnames(sub3)[x]
                ,  "p.value" = summary(aov)[[1]][["Pr(>F)"]]
                )
  return(res)
}
num_df <- do.call(rbind, lapply(seq_along(sub3)[-1], df_num))
head(num_df)

但我的结果是:

                                               p.value
1   Attrition   Age                1.996802e-26 
2   Attrition   Age                 NA  
3   Attrition   DistanceFromHome    5.182860e-01    
4   Attrition   DistanceFromHome    NA  
5   Attrition   MonthlyIncome          3.842748e-02 
6   Attrition   MonthlyIncome           NA  

我不明白为什么代码没有针对所有数据集变量运行以及 Age、DistanceFromHome 和 MonthlyIncome 重复的原因

r anova
1个回答
1
投票

您的代码可能针对所有变量运行,但您仅通过运行

head
显示前 6 个条目!尝试运行
print(num_df, n=nrow(num_df))
,这将显示所有条目。

num_df
中重复值的原因是您创建的
aov
对象有 2 行,因此子集列
Pr(>F)
返回两个值。您可以通过尝试自己进行测试,这将计算磨损和年龄对的方差分析:

aov <- aov(as.numeric(sub3$Attrition) ~ sub3[, 2], data = sub3)
summary(aov)[[1]][["Pr(>F)"]]  # this will report the p-value, and a NA value

要修复重复,您需要从

Pr(>F)
列中提取第一个值,如下所示:

df_num <- function(x) {
  aov <- aov(as.numeric(sub3$Attrition) ~ sub3[, x], data = sub3)

  res <- data.frame('row' = 'Attrition'
                , 'column' = colnames(sub3)[x]
                ,  "p.value" = summary(aov)[[1]][["Pr(>F)"]][1]  # use only the first value of the p-value column
                )
  return(res)
}
© www.soinside.com 2019 - 2024. All rights reserved.