StatsModels - mse_model 给出的值不正确? (sklearn作品)

问题描述 投票:0回答:0

问题

我想根据其他 N 个人的考试成绩来预测一个人的考试成绩。由于某种原因,

OLSResults.mse_model
调用无法正常工作。

我知道系数和截距项以及预测都是正确的,但由于某种原因,一次调用返回了一个非常古怪的数字,我不确定它来自哪里。

这是我的代码的 MVE。我对数据进行了硬编码,通过复制我写的 4 行来至少使用 8 行(否则如果样本数量少于 8,statsmodels 会抱怨)

使用5个人,因变量为“PersonX”


MVE

import pandas as pd
import statsmodels.api as sm
rows = [
    {"Person1":79, "Person2":95, "Person3":34,"Person4":46,"Person5":10,"PersonX":50},
    {"Person1":65, "Person2":88, "Person3":45,"Person4":24,"Person5":32,"PersonX":51},
    {"Person1":87, "Person2":91, "Person3":23,"Person4":35,"Person5":10,"PersonX":78},
    {"Person1":67, "Person2":101,"Person3":34,"Person4":55,"Person5":15,"PersonX":88},
]

# Too lazy to type out four more rows, just double the y's
rows += [{k:v*2 for k,v in r.items()} for r in rows]

exams = pd.DataFrame.from_records(rows)

Y = np.array(exams['PersonX'])
X = exams[[c for c in exams.columns if c != "PersonX"]]
X = sm.add_constant(X)
model = sm.OLS(Y,X)

results = model.fit()

y_pred = np.array(results.predict(X).round())

print(f"Y-Pred: {y_pred}")
print(f"Y-True: {Y}")
print(f"Mean squared error: {results.mse_model}")

这打印出来:

Y-Pred: [ 50.  51.  78.  88. 100. 102. 156. 176.]
Y-True: [ 50  51  78  88 100 102 156 176]
Mean squared error: 3611.21875

均方误差怎么这么高?应该基本为零! (减去一些舍入误差)


调试上面的代码

因此,如果您运行相同的代码,删除 exams 行下方的所有内容,并切换为

sklearn
等价物,您将拥有:

from sklearn import linear_model
from sklearn.metrics import mean_squared_error
X = exams[[c for c in exams.columns if c != "PersonX"]]
Y = np.array(exams['PersonX'])
reg = linear_model.LinearRegression(fit_intercept=True)
reg.fit(X, Y)

y_pred = reg.predict(X)

这会产生与

y_pred
包相同的“完美”
statsmodels
,预测完全正确。那给了什么?


求助参考资料:

此处供参考的是每个包的系数(大致相同):

sklearn:

Coefficients: [-0.87064829  2.43878556 -4.15147725  0.08599284  2.42911432]
Intercept:    -8.526512829121202e-14

sm:

Coefficients: [-0.870648294  2.43878556 -4.15147725 0.0859928361  2.42911432]
Intercept: -1.14463994e-13

截距略有不同。

python pandas scikit-learn linear-regression statsmodels
© www.soinside.com 2019 - 2024. All rights reserved.