使用 python 进行回归

问题描述 投票:0回答:0

我想运行多个线性回归模型,并且有 5 个自变量(其中 2 个是分类变量)。

因此,我首先应用 onehotencoder 将分类变量转换为虚拟变量。

这些是因变量和自变量

y = df['price']
x = df[['age', 'totalRooms', 'elevator',
        'floorLevel_bottom', 'floorLevel_high', 
        'floorLevel_low',
        'floorLevel_medium','floorLevel_top',
        'buildingType_bungalow', 'buildingType_plate', 
        'buildingType_plate_tower', 'buildingType_tower']]

接下来我尝试了下面两种方法,但是发现他们的结果是不一样的

from sklearn.linear_model import LinearRegression

mlr = linear_model.LinearRegression()
mlr.fit(x, y)

print('Intercept: \n', mlr_in.intercept_)
print("Coefficients:")
list(zip(x, mlr_in.coef_))

这给

拦截: 35228.96453917408

系数: [('年龄', 1046.5347118942063), ('totalRooms', -797.7667275033103), ('电梯', 11940.629576736419), ('floorLevel_bottom', 1011.5929167549165), ('floorLevel_high', 157.60625500592502), ('floorLevel_low', 483.89164772666277), ('floorLevel_medium', 630.9547280568961), ('floorLevel_top', -2284.0455475443687), ('buildingType_bungalow', 31610.88176756009), ('buildingType_plate', -9649.087529585862), ('buildingType_plate_tower', -8813.187607409624), ('buildingType_tower', -13148.606630564624)]

import statsmodels.formula.api as smf

x_in = sm.add_constant(x_in)
model = sm.OLS(y, x_in).fit()
print(model.summary())

但这给了


拦截 2.43e+04
年龄 1046.5347
总房间数 -797.7667
电梯 1.194e+04
floorLevel_bottom 5870.7604
floorLevel_high 5016.7738
floorLevel_low 5343.0592
floorLevel_medium 5490.1223
floorLevel_top 2575.1220
建筑类型_平房 3.768e+04
buildingType_plate -3575.1281
buildingType_plate_tower -2739.2282
buildingType_tower -7074.6472

现在我不明白他们之间的区别;(

python linear-regression olsmultiplelinearregression
© www.soinside.com 2019 - 2024. All rights reserved.