R - 进行线性回归时出现错误“可变长度不同”

问题描述 投票:0回答:0

总体目标:实现先聚类再预测的方法来预测房价变量MEDV。在尝试通过线性回归对训练集集群进行预测的步骤中,出现错误“Error in model.frame.default(formula = HDataTrain$MEDV ~ ., data = HDGroup1Train, : 可变长度不同(为'CRIM'找到)“

已经尝试进入 .csv 文件并使所有值的长度相同 (.XXX) 3 个小数点。

#Boston Housing Dataset Revisited

#read dataset
set.seed(100)
HData = read.csv("HousingData.csv")
HData$CRIM = as.factor(HData$CRIM)
is.factor(HData$CRIM)
str(HData)

# Step 1
# Training and test sets
library(caTools)
spl = sample.split(HData$MEDV, SplitRatio = 0.7)
HDataTrain = subset(HData, spl==TRUE)
HDataTest = subset(HData, spl==FALSE)

# Step 2: Clustering on Training set
# Dataset for clustering
HDCluster = HDataTrain[, -12]
HDClusterNorm = predict(preProcess(HDCluster), HDCluster)
KmeansHD = kmeans(HDClusterNorm, centers = 3)

# Training and Test Sets for each cluster
HDGroup1Train = subset(HDataTrain, KmeansHD$cluster == 1)
HDGroup2Train = subset(HDataTrain, KmeansHD$cluster == 2)
HDGroup3Train = subset(HDataTrain, KmeansHD$cluster == 3)

# Try some visualization
fviz_nbclust(HDClusterNorm, kmeans, method = "wss")

# Step 3: Prediction on Training set

HDataModel1 = lm(HDataTrain$MEDV ~., data = HDGroup1Train)
HDataModel2 = lm(HDataTrain$MEDV ~., data = HDGroup2Train)
HDataModel3 = lm(HDataTrain$MEDV ~., data = HDGroup3Train)

variables linear-regression cluster-analysis prediction variable-length
© www.soinside.com 2019 - 2024. All rights reserved.