Python statsmodel 输出和 Excel/Google Sheet 输出不匹配

问题描述 投票:0回答:1

我有一个小数据集,由于某种原因,输出与 Excel 的不匹配。

这就是我所做的。我必须专栏:

行驶里程 旅行时间
89 7.0
66 5.4
78 6.6
111 7.4
44 4.8
77 6.4
80 7.0
66 5.6
109 7.3
76 6.4

这是我在 Google Sheet 上得到的输出:

坡度 拦截
系数 0.04025678079 3.185560249
标准误差 0.005706415564 0.4669507938
R 平方,标准误差 0.8615153295 0.3423088398
F 统计 49.76812677 8
回归 SS / 残余 SS 5.831597265 0.9374027345

此输出也与 Excel 输出匹配。

但是,当我在 statsmodel 上执行以下操作时:

milesTravelled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]

model = sm.OLS(travelTime, milesTraveled).fit()
print(model.summary())

我得到以下信息:

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:            Travel Time   R-squared (uncentered):                   0.985
Model:                            OLS   Adj. R-squared (uncentered):              0.983
Method:                 Least Squares   F-statistic:                              575.6
Date:                Mon, 01 Feb 2021   Prob (F-statistic):                    1.82e-09
Time:                        10:18:44   Log-Likelihood:                         -11.951
No. Observations:                  10   AIC:                                      25.90
Df Residuals:                       9   BIC:                                      26.20
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Miles Traveled     0.0781      0.003     23.991      0.000       0.071       0.085
==============================================================================
Omnibus:                        2.179   Durbin-Watson:                   2.654
Prob(Omnibus):                  0.336   Jarque-Bera (JB):                1.033
Skew:                          -0.777   Prob(JB):                        0.597
Kurtosis:                       2.741   Cond. No.                         1.00
==============================================================================

如您所见,标准误差、R 方等值与 Google Sheet/Excel 根本不匹配。我究竟做错了什么?我该怎么做才能获得像 Google Sheet/Excel 一样准确的结果摘要?

python pandas numpy statsmodels
1个回答
0
投票

默认情况下,

OLS
类不包含线性模型中的常数项。您可以使用
sm.add_constant
exog
创建适当的
OLS
参数:

In [36]: milesTraveled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]

In [37]: travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]

In [38]: X = sm.add_constant(milesTraveled)

In [39]: model = sm.OLS(travelTime, X).fit()

In [40]: print(model.summary())
/Users/warren/a2020.11/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.862
Model:                            OLS   Adj. R-squared:                  0.844
Method:                 Least Squares   F-statistic:                     49.77
Date:                Mon, 01 Feb 2021   Prob (F-statistic):           0.000107
Time:                        13:04:53   Log-Likelihood:                -2.3532
No. Observations:                  10   AIC:                             8.706
Df Residuals:                       8   BIC:                             9.312
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.1856      0.467      6.822      0.000       2.109       4.262
x1             0.0403      0.006      7.055      0.000       0.027       0.053
==============================================================================
Omnibus:                        0.542   Durbin-Watson:                   2.608
Prob(Omnibus):                  0.763   Jarque-Bera (JB):                0.554
Skew:                           0.370   Prob(JB):                        0.758
Kurtosis:                       2.115   Cond. No.                         353.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
© www.soinside.com 2019 - 2024. All rights reserved.