我刚刚开始使用python进行机器学习,我正在研究多重线性回归。我在哪里学习了虚拟变量陷阱,可以通过向后消除来解决,但在应用向后消除时却遇到了此错误。 (PatsyError:模型缺少必需的结果变量)
这些是我导入的文件
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder , OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as sm
这些是我的数据集的前5行:
gender age exercise hours grade
0 female 17 3 10 82.4
1 male 18 4 4 78.2
2 male 18 5 9 79.3
3 female 14 2 7 83.2
4 female 18 4 15 87.4
real_x = data_frame.iloc[:,:4].values
real_y = data_frame.iloc[:,4:].values
label_encoder_obj = LabelEncoder()
real_x[:,0] = label_encoder_obj.fit_transform(real_x[:,0])
one_hot_encoder = OneHotEncoder(categorical_features=[2])
real_x = one_hot_encoder.fit_transform(real_x).toarray()
real_x = real_x[:,1:]
training_x,test_x,training_y,test_y=
train_test_split(real_x,real_y,test_size=0.2,random_state=0)
multiple_linear_regression = LinearRegression()
multiple_linear_regression.fit(training_x,training_y)
predection_y = multiple_linear_regression.predict(test_x)
real_x=np.append(arr=np.ones((real_x.shape[0],1)).astype(int),
values=real_x,axis=1)
x_optimization = real_x[:,[0,1,2,3,4,5]]
在下面的行中,我得到了错误。
regresion_ordinary_least_squar = sm.ols(real_y,data=x_optimization).fit();
#如果丢失=='raise',则没有missing_mask
PatsyError: model is missing required outcome variables
而且我已经看到一些在线示例,其中包含一些代码
sm.OLS()
用于代替
sm.ols()
有什么区别?
您应该使用
将statsmodels.regression.linear_model导入为sm;
而不是
将statsmodels.formula.api导入为sm
和使用
regresion_ordinary_least_squar = sm.OLS(endog = real_y,exog = x_optimization).fit()
代替
regresion_ordinary_least_squar = sm.ols(real_y,data = x_optimization).fit();