使用单变量测试逻辑回归不会返回正确的输出

问题描述 投票:0回答:0

这是我用来构建模型的包含 500 条记录的数据集示例。我想预测一定年龄和薪水的人是否购买了这辆车:

  Age  EstimatedSalary  Purchased
   23         20000        0
   47         32000        1
   31         25000        0

代码如下:

  #Logistic Regression 

  # importing the dataset and choosing Age and Salary column
  dataset=read.csv('Car_Ads.csv')
  dataset=dataset[,3:5]

  #split dataset into train and test
  library(caTools)
  set.seed(123)
  split=sample.split(dataset$Purchased,SplitRatio = 0.75)
  training_set=subset(dataset,split==TRUE)
  test_set=subset(dataset,split==FALSE)

  #feature scaling for both columns
  training_set[,1:2]=scale(training_set[,1:2])
  test_set[,1:2]=scale(test_set[,1:2])

  #fitting logistic regression to dataset
  classifier=glm(formula=Purchased~.,family=binomial,data=training_set)

  #predicting the test set results
  prob_pred=predict(classifier,type='response',newdata = test_set[-3])
  y_pred=ifelse(prob_pred>0.5,1,0)
  

代码工作正常,因为

y_pred
是 0 和 1 的数组,我可以将其与
test_set
进行比较,并且我可以用它们创建混淆矩阵。然后我想用单个值测试这个模型,所以我添加了这行代码:

 #predict by single value
 var=data.frame(Age=20,EstimatedSalary=40000)
 var1=predict(classifier,type='response',newdata = var)
 var2=ifelse(var1>0.5,1,0)
 print(var2)

这在逻辑上是行不通的。无论我如何改变年龄和薪水,它总是返回:

  > print(var2)
   1 
   1 
    

为什么会发生这种情况,我该如何解决?

r machine-learning rstudio logistic-regression predict
© www.soinside.com 2019 - 2024. All rights reserved.