在R(插入符号)中重新运行preProcess(),predict()和train()时,模型的精度不同

问题描述 投票:1回答:1

下面的数据只是一个例子,它是对此或任何我感到困惑的数据的操作:

library(caret)
set.seed(3433)
data(AlzheimerDisease)
complete <- data.frame(diagnosis, predictors)
in_train <- createDataPartition(complete$diagnosis, p = 0.75)[[1]]
training <- complete[in_train,]
testing <- complete[-in_train,]
predIL <- grep("^IL", names(training))
smalltrain <- training[, c(1, predIL)]

fit_noPCA <- train(diagnosis ~ ., method = "glm", data = smalltrain)
pre_proc_obj <- preProcess(smalltrain[,-1], method = "pca", thresh = 0.8)
smalltrainsPCs <- predict(pre_proc_obj, smalltrain[,-1])
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm")
fit_noPCA$results$Accuracy
fit_PCA$results$Accuracy

[运行此代码时,fit_noPCA的精度为0.689539,fit_PCA的精度为0.682951。但是当我重新运行代码的最后一部分时:

fit_noPCA <- train(diagnosis ~ ., method = "glm", data = smalltrain)
pre_proc_obj <- preProcess(smalltrain[,-1], method = "pca", thresh = 0.8)
smalltrainsPCs <- predict(pre_proc_obj, smalltrain[,-1])
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm")
fit_noPCA$results$Accuracy
fit_PCA$results$Accuracy

然后,我每次重新运行这6行,都会得到不同的精度值。为什么会这样呢?是因为我没有重置种子吗?即使,此过程的固有随机性在哪里?

r machine-learning r-caret glm
1个回答
0
投票
默认情况下,该模型是使用引导程序进行训练的,您可以在此处看到它:

> fit_noPCA Generalized Linear Model 251 samples 12 predictor 2 classes: 'Impaired', 'Control' No pre-processing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 251, 251, 251, 251, 251, 251, ... Resampling results: Accuracy Kappa 0.6870006 0.04107016

因此,每个train的自举样本将有所不同,要返回相同的结果,可以在运行train之前设置种子:

set.seed(111) fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100)) fit_PCA$results$Accuracy [1] 0.6983512 set.seed(112) fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100)) fit_PCA$results$Accuracy [1] 0.6991537 set.seed(111) fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100)) fit_PCA$results$Accuracy [1] 0.6983512

或使用例如cv,您可以在index=中使用trainControl定义折页>        
© www.soinside.com 2019 - 2024. All rights reserved.