我试图在葡萄酒数据集上进行最佳子集选择,然后我想使用10倍CV得出测试错误率。我使用的代码是-
cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5)
res.best.logistic <-
bestglm(Xy = winedata,
family = binomial, # binomial family for logistic
IC = "AIC", # Information criteria
method = "exhaustive")
res.best.logistic$BestModels
best.cv.err<- cv.glm(winedata,res.best.logistic$BestModel,cost1, K=10)
但是,这给出了错误-
Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "NULL"
我以为$ BestModel是代表最佳拟合的lm对象,这就是manual也所说的。如果是这种情况,那么为什么不能在cv.glm的帮助下使用10折CV找到它的测试错误?
[使用的数据集是来自https://archive.ics.uci.edu/ml/datasets/Wine+Quality的白葡萄酒数据集,使用的包是boot
的cv.glm
包和bestglm
的包。
数据被处理为-
winedata <- read.delim("winequality-white.csv", sep = ';')
winedata$quality[winedata$quality< 7] <- "0" #recode
winedata$quality[winedata$quality>=7] <- "1" #recode
winedata$quality <- factor(winedata$quality)# Convert the column to a factor
names(winedata)[names(winedata) == "quality"] <- "good" #rename 'quality' to 'good'
bestglm fit重新排列您的数据并将您的响应变量命名为y,因此,如果将其传递回cv.glm,winedata的确存在y列,并且此后所有崩溃。>
检查类是什么总是很好:
class(res.best.logistic$BestModel) [1] "glm" "lm"
但是,如果您查看
res.best.logistic$BestModel
的调用:
res.best.logistic$BestModel$call glm(formula = y ~ ., family = family, data = Xi, weights = weights) head(res.best.logistic$BestModel$model) y fixed.acidity volatile.acidity citric.acid residual.sugar chlorides 1 0 7.0 0.27 0.36 20.7 0.045 2 0 6.3 0.30 0.34 1.6 0.049 3 0 8.1 0.28 0.40 6.9 0.050 4 0 7.2 0.23 0.32 8.5 0.058 5 0 7.2 0.23 0.32 8.5 0.058 6 0 8.1 0.28 0.40 6.9 0.050 free.sulfur.dioxide density pH sulphates 1 45 1.0010 3.00 0.45 2 14 0.9940 3.30 0.49 3 30 0.9951 3.26 0.44 4 47 0.9956 3.19 0.40 5 47 0.9956 3.19 0.40 6 30 0.9951 3.26 0.44
您可以在通话中替换其他内容,但这太混乱了。拟合并不昂贵,因此可以对winedata进行拟合并将其传递给cv.glm:
best_var = apply(res.best.logistic$BestModels[,-ncol(winedata)],1,which)
# take the variable names for best model
best_var = names(best_var[[1]])
new_form = as.formula(paste("good ~", paste(best_var,collapse="+")))
fit = glm(new_form,winedata,family="binomial")
best.cv.err<- cv.glm(winedata,fit,cost1, K=10)