我想运行多个线性回归模型,并且有 5 个自变量(其中 2 个是分类变量)。
因此,我首先应用 onehotencoder 将分类变量转换为虚拟变量。
这些是因变量和自变量
y = df['price']
x = df[['age', 'totalRooms', 'elevator',
'floorLevel_bottom', 'floorLevel_high',
'floorLevel_low',
'floorLevel_medium','floorLevel_top',
'buildingType_bungalow', 'buildingType_plate',
'buildingType_plate_tower', 'buildingType_tower']]
接下来我尝试了下面两种方法,但是发现他们的结果是不一样的
from sklearn.linear_model import LinearRegression
mlr = linear_model.LinearRegression()
mlr.fit(x, y)
print('Intercept: \n', mlr_in.intercept_)
print("Coefficients:")
list(zip(x, mlr_in.coef_))
这给
拦截: 35228.96453917408
系数: [('年龄', 1046.5347118942063), ('totalRooms', -797.7667275033103), ('电梯', 11940.629576736419), ('floorLevel_bottom', 1011.5929167549165), ('floorLevel_high', 157.60625500592502), ('floorLevel_low', 483.89164772666277), ('floorLevel_medium', 630.9547280568961), ('floorLevel_top', -2284.0455475443687), ('buildingType_bungalow', 31610.88176756009), ('buildingType_plate', -9649.087529585862), ('buildingType_plate_tower', -8813.187607409624), ('buildingType_tower', -13148.606630564624)]
import statsmodels.formula.api as smf
x_in = sm.add_constant(x_in)
model = sm.OLS(y, x_in).fit()
print(model.summary())
但这给了
拦截 2.43e+04
年龄 1046.5347
总房间数 -797.7667
电梯 1.194e+04
floorLevel_bottom 5870.7604
floorLevel_high 5016.7738
floorLevel_low 5343.0592
floorLevel_medium 5490.1223
floorLevel_top 2575.1220
建筑类型_平房 3.768e+04
buildingType_plate -3575.1281
buildingType_plate_tower -2739.2282
buildingType_tower -7074.6472
现在我不明白他们之间的区别;(