为什么我不能在bestglm的输出上使用cv.glm?

问题描述 投票:1回答:1

我试图在葡萄酒数据集上进行最佳子集选择,然后我想使用10倍CV得出测试错误率。我使用的代码是-

cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5)
res.best.logistic <-
    bestglm(Xy = winedata,
            family = binomial,          # binomial family for logistic
            IC = "AIC",                 # Information criteria
            method = "exhaustive")
res.best.logistic$BestModels
best.cv.err<- cv.glm(winedata,res.best.logistic$BestModel,cost1, K=10)

但是,这给出了错误-

Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "NULL"

我以为$ BestModel是代表最佳拟合的lm对象,这就是manual也所说的。如果是这种情况,那么为什么不能在cv.glm的帮助下使用10折CV找到它的测试错误?

[使用的数据集是来自https://archive.ics.uci.edu/ml/datasets/Wine+Quality的白葡萄酒数据集,使用的包是bootcv.glm包和bestglm的包。

数据被处理为-

winedata <- read.delim("winequality-white.csv", sep = ';')
winedata$quality[winedata$quality< 7] <- "0" #recode
winedata$quality[winedata$quality>=7] <- "1" #recode
winedata$quality <- factor(winedata$quality)# Convert the column to a factor
names(winedata)[names(winedata) == "quality"] <- "good"      #rename 'quality' to 'good'
r machine-learning logistic-regression cross-validation
1个回答
0
投票

bestglm fit重新排列您的数据并将您的响应变量命名为y,因此,如果将其传递回cv.glm,winedata的确存在y列,并且此后所有崩溃。>

检查类是什么总是很好:

class(res.best.logistic$BestModel)
[1] "glm" "lm" 

但是,如果您查看res.best.logistic$BestModel的调用:

res.best.logistic$BestModel$call

glm(formula = y ~ ., family = family, data = Xi, weights = weights)

head(res.best.logistic$BestModel$model)
  y fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
1 0           7.0             0.27        0.36           20.7     0.045
2 0           6.3             0.30        0.34            1.6     0.049
3 0           8.1             0.28        0.40            6.9     0.050
4 0           7.2             0.23        0.32            8.5     0.058
5 0           7.2             0.23        0.32            8.5     0.058
6 0           8.1             0.28        0.40            6.9     0.050
  free.sulfur.dioxide density   pH sulphates
1                  45  1.0010 3.00      0.45
2                  14  0.9940 3.30      0.49
3                  30  0.9951 3.26      0.44
4                  47  0.9956 3.19      0.40
5                  47  0.9956 3.19      0.40
6                  30  0.9951 3.26      0.44

您可以在通话中替换其他内容,但这太混乱了。拟合并不昂贵,因此可以对winedata进行拟合并将其传递给cv.glm:

best_var = apply(res.best.logistic$BestModels[,-ncol(winedata)],1,which)
# take the variable names for best model
best_var = names(best_var[[1]])
new_form = as.formula(paste("good ~", paste(best_var,collapse="+")))
fit = glm(new_form,winedata,family="binomial")

best.cv.err<- cv.glm(winedata,fit,cost1, K=10)
© www.soinside.com 2019 - 2024. All rights reserved.