Ran插入符号模型,它停止了。它提到了在重新采样的绩效评估中缺少的值

问题描述 投票:-1回答:1

[[Dataset]我尝试过泰坦尼克号的问题,是一名新手。刚要使用数据集进行训练,这就是我遇到的问题:

[data_prepro_maf_train]

all_model<-modelLookup()
classification_model<-all_model%>%filter(forClass==TRUE,!duplicated(model))
class_model<-classification_model$model
set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "final",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
levels(y)<-make.names(levels(factor(data_prepro_maf_train[,12])))
y<-make.names(data_prepro_maf_train[,12],unique=TRUE,allow_=TRUE)
#Train the models
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

我确保选择的列没有“ Cabin”之类的缺失值,并且已经删除了所需列的缺失值。

使用的软件包:

library(caret)
library(caretEnsemble)
library(tidyverse)
library(magrittr)
library(doParallel)
r r-caret
1个回答
0
投票

[尝试解决研究问题并因此中断。解决我的问题的方法可能是:

1)一种热编码:基本上是一种将训练数据转换为简单因子/数字的重新处理方法

2)参数输入法:

x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

我将其更改为y〜X1 + X2 + X3方法,至少现在CaretList正在运行某些模型[关于公式-vs-非公式界面的列1的讨论

以下是所做的更改:

#Let’s one hot encode the data_prepro_maf_train data
dummy_model1<-dummyVars(title~.,data=data_prepro_maf_train[c(1,2,3,5,6,7,8,10)])

data_train_mat1<-predict(dummy_model1,newdata=data_prepro_maf_train)

data_prepro_maf_train2<-data.frame(data_train_mat1)

#Add back columns “title” and “Embarked”, which have vital factors for the model
data_prepro_maf_train2<-cbind(data_prepro_maf_train$Embarked,data_prepro_maf_train$title,data_prepro_maf_train2)

colnames(data_prepro_maf_train2)[1]<-"Embarked"
colnames(data_prepro_maf_train2)[2]<-"title"
#Adjust consistency of levels in the new train data. If the error below shows up, try running this code again before running model_list2 (not sure why it is not saved):
"Error: One or more factor levels in the outcome has no data: 'Q'"

levels(data_prepro_maf_train2$Embarked)<-droplevels(data_prepro_maf_train2$Embarked)

set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "all",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
#Since the class_model has over 100 models...let's select a few that we know for testing the previous error (I stumbled upon the “preProcess=c(“center”,”scale”) which said to help in my situation…not sure how it works and would appreciate if someone could explain it??  :
model_list2<-caretList(Embarked~title+Pclass+Age+Sex.male+Sex.female+SibSp+Parch,data=data_prepro_maf_train1,preProcess = c("center", "scale"),trControl = control,metric="Accuracy",methodList = class_model[c(37,52,55,68,102,145,167,189)])

不确定这是否是我的问题的结局。。。至少该模型正在运行并且没有任何发现就不会停止

© www.soinside.com 2019 - 2024. All rights reserved.