我在 cm.glm 行遇到以下错误: 错误:
data
和 reference
应该是具有相同水平的因子。
# Predict using Logistic Regression
pred.glm <- ifelse(predict(fit.glm, irisTest) > 0.5, "setosa", "other")
cm.glm <- confusionMatrix(pred.glm, (irisTest$Species))
acc.glm <- cm.glm$overall['Accuracy']
prec.glm <- cm.glm$byClass['Pos Pred Value']
rec.glm <- cm.glm$byClass['Sensitivity']
# Load libraries
library(MASS)
library(caret)
library(nnet)
# Loading iris dataset
data(iris)
# Convert it into a binary class dataset
iris$Species <- ifelse(iris$Species == "setosa", "setosa", "other")
# Split the dataset
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = .8,
list = FALSE,
times = 1)
irisTrain <- iris[ trainIndex,]
irisTest <- iris[-trainIndex,]
# Fit Logistic Regression
fit.glm <- multinom(Species ~ ., data = iris)
# Predict using Logistic Regression
pred.glm <- ifelse(predict(fit.glm, irisTest) > 0.5, "setosa", "other")
cm.glm <- confusionMatrix(pred.glm, (irisTest$Species))
acc.glm <- cm.glm$overall['Accuracy']
prec.glm <- cm.glm$byClass['Pos Pred Value']
rec.glm <- cm.glm$byClass['Sensitivity']
您的代码有两个问题。
First
predict
默认给出最有可能的类,为了与链接函数的截止进行比较,您必须使用 type="probs"
作为参数。
第二个
confusionMatrix
期望两个参数是具有相同水平的因子。只需将向量转换为因子即可。如果其中一组不具有两个因子水平(其他种子可能会发生这种情况),请明确指定因子水平。
您的两行代码应如下所示:
pred.glm <- ifelse(predict(fit.glm, irisTest, type="probs") > 0.5, "setosa", "other")
cm.glm <- confusionMatrix(
factor(pred.glm, levels = c("setosa", "other")),
factor(irisTest$Species, levels = c("setosa", "other"))
)