泰坦尼克号Kaggle数据集朴素贝叶斯分类器错误R编程

问题描述 投票:0回答:1

我正在尝试为Kaggle - Titanic数据集(URL-https://www.kaggle.com/c/titanic/data为“train.csv”和“test.csv”)训练一个朴素的贝叶斯分类器。

我到目前为止提出的代码如下─

library(e1071)

train_d <- read.csv("train.csv", stringsAsFactors = TRUE)

# columns chosen for training data-
# colnames(TD)  OR names(TD)
# "Survived", "Pclass", "Sex", "Age", "SibSp", "Parch","Embarked"
train_data <- train_d[, c(2:3, 5:8, 12)]

# to find out which columns contain NA (missing values)-
colnames(train_data)[apply(is.na(train_data), 2, any)]

# mean(TD$age, na.rm = TRUE)    # to find mean of 'age' which contains 'NA'
# which(is.na(age))

# fill in missing value (NA) with mean of 'Age' column-
train_data$Age[which(is.na(train_data$Age))] <- mean(train_data$Age, na.rm = TRUE)

# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]


test_d <- read.csv("test.csv", stringsAsFactors = TRUE)

# columns chosen for training data-
# "Pclass", "Sex", "Age", "SibSp", "Parch", "Embarked"
test_data <- test_d[, c(2, 4:7, 11)]

# find out missing values (NA)-
colnames(test_data)[apply(is.na(test_data), 2, any)]

# fill in missing value (NA) with mean of 'Age' column-
test_data$Age[which(is.na(test_data$Age))] <- mean(test_data$Age, na.rm = TRUE)

# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]




# training a naive-bayes classifier-
titanic_nb <- naiveBayes(Survived ~ Pclass + Sex + Age + SibSp + Parch + Embarked, data = train_data)


# predict using trained naive-bayes classifier-
output <- predict(titanic_nb, test_data, type = "class")

但是,'输出'实际上并不包含任何内容。 “输出”变量的输出是 -

> output
factor(0)
Levels: 

出了什么问题?

谢谢!

r machine-learning naivebayes kaggle
1个回答
0
投票

Here is the answer:原始问题被删除,所以web-cache链接。

原因是该模型并不真正知道如何处理字符列,因为您可以看到是否运行data.matrix(test_data)

解决方案是首先将您的角色列转换为因子,确保列车和测试中的因子水平一致。

在旁注中,我建议从随机森林开始,因为它通常在没有任何参数调整的情况下表现良好,并且不关心变量的分布(与假设高斯分布的NB相反)。

© www.soinside.com 2019 - 2024. All rights reserved.