这是我用来构建模型的包含 500 条记录的数据集示例。我想预测一定年龄和薪水的人是否购买了这辆车:
Age EstimatedSalary Purchased
23 20000 0
47 32000 1
31 25000 0
代码如下:
#Logistic Regression
# importing the dataset and choosing Age and Salary column
dataset=read.csv('Car_Ads.csv')
dataset=dataset[,3:5]
#split dataset into train and test
library(caTools)
set.seed(123)
split=sample.split(dataset$Purchased,SplitRatio = 0.75)
training_set=subset(dataset,split==TRUE)
test_set=subset(dataset,split==FALSE)
#feature scaling for both columns
training_set[,1:2]=scale(training_set[,1:2])
test_set[,1:2]=scale(test_set[,1:2])
#fitting logistic regression to dataset
classifier=glm(formula=Purchased~.,family=binomial,data=training_set)
#predicting the test set results
prob_pred=predict(classifier,type='response',newdata = test_set[-3])
y_pred=ifelse(prob_pred>0.5,1,0)
代码工作正常,因为
y_pred
是 0 和 1 的数组,我可以将其与 test_set
进行比较,并且我可以用它们创建混淆矩阵。然后我想用单个值测试这个模型,所以我添加了这行代码:
#predict by single value
var=data.frame(Age=20,EstimatedSalary=40000)
var1=predict(classifier,type='response',newdata = var)
var2=ifelse(var1>0.5,1,0)
print(var2)
这在逻辑上是行不通的。无论我如何改变年龄和薪水,它总是返回:
> print(var2)
1
1
为什么会发生这种情况,我该如何解决?