从没有 NaN 的 lm 函数中排除缺失值

问题描述 投票:0回答:0

我正在对三个变量进行线性回归 (lm),需要根据处理和土壤类型将其分开。我的输出是每个土壤*处理组合的单独摘要,我将使用它来制作一组等高线图。我可以通过使用没有缺失数据的原始数据集来做到这一点,但异常值会扭曲数据。我想知道如何运行这段代码来排除缺失值。

我对较大的数据集进行了子集化,以仅包含我想要的列,并排除了每种土壤类型中的两种处理方法(这适用于或不指定因子和数字方面):

CropSub <- subset(Pots1, !Treatment %in% c("Control1", "Control2"), 
select = c(as.factor(Soil), as.factor(Treatment), as.numeric(Drywt), as.numeric(Nrecovery),
as.numeric(Precovery)), na.action = function(x) x[, complete.cases(x)]) 

然后我分别按土壤和处理类型运行 lm 模型:

CropMod1 <- by(CropSub, list(CropSub$Soil, CropSub$Treatment), function(df) {
  CropMod1_lm <- lm(Drywt ~ Nrecovery + Precovery, data=df, na.action=na.exclude)
return(CropMod1_lm)
})

在没有缺失数据的 soil*treatment 组合上运行摘要时,这一切都很好,但是一旦处理在我选择的任何变量中显示缺失值,我就会得到以下输出:

summary(CropMod1[["Haverhill", "CanolaMeal10thaTSP"]])
enter image description here 如果图像未显示,系数表会显示标准误差的 NaN 值、t 值和 Pr(>|t|) 值:

估计标准。误差 t 值 Pr(>|t|)(截距)9.44438 NaN NaN NaN N恢复 0.09192 NaN NaN NaN 预采收率 0.04506 NaN NaN NaN

为了排除缺失值,我尝试了以下方法,但结果没有变化:

  • 在 csv 调用期间添加 'na.strings = ""'
  • 用 NA 替换我数据集中的所有空白
  • 在 lm 调用中用 na.omit 和 complete.cases 参数省略或替换“na.action = na.exclude”
  • 用 na.exclude、na.omit 省略或替换子集函数中的 complete.cases 参数
  • na.rm=TRUE 不适用于子集或 lm 函数。
  • 使用
    CropSub1 <- subset(CropSub, !is.na(Drywt) & !is.na(Nrecovery) & !is.na(Precovery))
    排除缺失数据
  • 使用
    CropSub <- na.exclude(CropSub)
    排除缺失数据

以数据为例:

data <- data.frame(
  Soil = c("Haverhill", "Haverhill", "Haverhill", "Haverhill", "Haverhill", "Haverhill", "Haverhill", "Haverhill", "Oxbow", "Oxbow", "Oxbow", "Oxbow", "Oxbow", "Oxbow", "Oxbow", "Oxbow"),
  Treatment = c("CanolaMeal10thaTSP", "CanolaMeal10thaTSP", "CanolaMeal10thaTSP", "CanolaMeal10thaTSP", "Manure10thaTSP", "Manure10thaTSP", "Manure10thaTSP", "Manure10thaTSP", "CanolaHull10tha", "CanolaHull10tha", "CanolaHull10tha", "CanolaHull10tha", "Willow10tha", "Willow10tha", "Willow10tha", "Willow10tha"),
  Drywt = c(NA, 10.75, 10.69, 10.81, 8.89, 9.83, 9.9, 9.31, 4.12, 4.78, 4.74, 3.75, 8.5, 4.4, 5.25, 3.45),
  Nrecovery = c(7.13, 8.9, 8.61, 9.44, 15.09, 14.32, 20.41, 18.04, NA, 20.24, 16.24, 15.9, 17.34, 14.03, 23.73, 15.56),
  Precovery = c(8.69, 10.82, 10.08, 11.05, 11.26, 11.46, 14.95, 13.94, 3.37, 22.15, 9.88, 3.78, 64.65, 15.84, 41.3, -0.47)
)
r nan missing-data lm
© www.soinside.com 2019 - 2024. All rights reserved.