为什么我通过sklearn获得的线性回归评分较低,而statsmodels的R平方值却很高?

问题描述 投票:0回答:1

我正在解决线性回归问题。使用统计模型进行的分析得出R平方为0.907,非常高。因此,我使用sklearn计算的模型的纵横比分数应该更大,但是我得到的分数仅为0.6478154705337766,这有点低。

我想念什么吗?在统计模型中,所有变量的p值均小于0.05。我没有检查其他变量,例如系数,因为我听说很多人都没有必要检查其他变量。详细信息升级问题如下。

问题陈述和相关数据集:https://datahack.analyticsvidhya.com/contest/black-friday/

Sklearn得分:0.6478154705337766

统计模型摘要:

 OLS Regression Results                                
 =======================================================================================
 Dep. Variable:                      y   R-squared (uncentered):                   0.907
 Model:                            OLS   Adj. R-squared (uncentered):              0.907
 Method:                 Least Squares   F-statistic:                          6.458e+04
 Date:                Mon, 21 Oct 2019   Prob (F-statistic):                        0.00
 Time:                        18:57:44   Log-Likelihood:                     -5.2226e+06
 No. Observations:              550068   AIC:                                  1.045e+07
 Df Residuals:                  549985   BIC:                                  1.045e+07
 Df Model:                          83                                                  
 Covariance Type:            nonrobust                                                  
 ==============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
 ------------------------------------------------------------------------------
 x1           -59.0946      9.426     -6.269      0.000     -77.569     -40.620
 x2           401.0189     10.441     38.409      0.000     380.555     421.483
 x3              1e+04     23.599    423.786      0.000    9954.518       1e+04
 x4          1.035e+04     21.740    475.990      0.000    1.03e+04    1.04e+04
 x5           1.04e+04     23.309    446.356      0.000    1.04e+04    1.04e+04
 x6          1.041e+04     26.858    387.693      0.000    1.04e+04    1.05e+04
 x7          1.065e+04     27.715    384.315      0.000    1.06e+04    1.07e+04
 x8          1.041e+04     32.580    319.469      0.000    1.03e+04    1.05e+04
x9           614.8732     19.178     32.061      0.000     577.285     652.462
x10          710.7823     23.135     30.723      0.000     665.438     756.126
x11          865.8851     27.138     31.906      0.000     812.695     919.076
x12          849.9004     18.358     46.296      0.000     813.919     885.881
x13          596.9014     31.632     18.870      0.000     534.904     658.899
x14          762.7278     25.809     29.553      0.000     712.143     813.312
x15          638.7214     18.085     35.319      0.000     603.276     674.166
x16          450.8858     82.928      5.437      0.000     288.349     613.423
x17          831.6309     43.033     19.325      0.000     747.287     915.975
x18         9266.9203     32.520    284.958      0.000    9203.182    9330.659
x19          548.8524     32.358     16.962      0.000     485.432     612.273
x20          819.7812     21.937     37.370      0.000     776.786     862.776
x21          575.2436     41.598     13.829      0.000     493.713     656.775
x22          780.1032     22.922     34.032      0.000     735.176     825.030
x23          854.8429     31.605     27.048      0.000     792.898     916.788
x24          603.5181     23.772     25.388      0.000     556.926     650.111
x25          635.8521     20.312     31.305      0.000     596.042     675.662
x26          455.0734     41.495     10.967      0.000     373.745     536.402
x27         1241.9456     36.844     33.708      0.000    1169.732    1314.160
x28          491.6905     21.378     23.000      0.000     449.791     533.590
x29          599.4075     10.701     56.014      0.000     578.434     620.381
x30         1024.8516     11.618     88.210      0.000    1002.080    1047.623
x31          282.3561     11.849     23.830      0.000     259.133     305.579
x32          218.2959     12.181     17.921      0.000     194.421     242.171
x33          194.9270     12.699     15.350      0.000     170.037     219.817
x34        -1038.1290     29.412    -35.296      0.000   -1095.776    -980.482
x35        -1429.4546     40.730    -35.096      0.000   -1509.284   -1349.625
x36        -1.021e+04     36.784   -277.658      0.000   -1.03e+04   -1.01e+04
x37        -5982.2095     15.651   -382.220      0.000   -6012.885   -5951.534
x38         3004.0730     28.298    106.159      0.000    2948.610    3059.536
x39         4535.2965     54.872     82.652      0.000    4427.749    4642.844
x40        -4645.1924     16.698   -278.195      0.000   -4677.919   -4612.466
x41         3110.6592    160.033     19.438      0.000    2797.000    3424.318
x42         7195.3346     48.059    149.718      0.000    7101.140    7289.529
x43        -7488.9490     24.289   -308.323      0.000   -7536.555   -7441.343
x44        -1.068e+04     53.542   -199.516      0.000   -1.08e+04   -1.06e+04
x45         -1.19e+04     45.546   -261.177      0.000    -1.2e+04   -1.18e+04
x46         1175.4639     83.574     14.065      0.000    1011.662    1339.266
x47         2354.8546     42.888     54.907      0.000    2270.795    2438.914
x48         2935.1657     35.917     81.721      0.000    2864.769    3005.562
x49        -1895.0141    134.688    -14.070      0.000   -2158.999   -1631.029
x50        -9003.5945     59.618   -151.022      0.000   -9120.444   -8886.745
x51        -1.194e+04     81.812   -145.944      0.000   -1.21e+04   -1.18e+04
x52        -1.158e+04     65.553   -176.632      0.000   -1.17e+04   -1.15e+04
x53         1489.1716     24.670     60.364      0.000    1440.819    1537.524
x54         2238.5714     93.608     23.914      0.000    2055.102    2422.041
x55         -732.7678     41.730    -17.560      0.000    -814.558    -650.978
x56          480.2321     29.776     16.128      0.000     421.872     538.592
x57         1076.8803     30.482     35.328      0.000    1017.136    1136.624
x58         1023.1860    128.939      7.935      0.000     770.470    1275.902
x59          987.0863     17.776     55.530      0.000     952.246    1021.926
x60          307.3852     45.456      6.762      0.000     218.293     396.478
x61         1979.9974     67.180     29.473      0.000    1848.327    2111.667
x62          441.5194     29.476     14.979      0.000     383.746     499.292
x63          203.3906     34.692      5.863      0.000     135.396     271.386
x64          250.2751     16.466     15.200      0.000     218.003     282.547
x65          653.6979     20.591     31.747      0.000     613.340     694.055
x66          893.8433     18.950     47.168      0.000     856.702     930.985
x67         1052.2746     29.336     35.870      0.000     994.777    1109.772
x68         1211.0301     61.789     19.599      0.000    1089.925    1332.135
x69          626.3778    131.545      4.762      0.000     368.553     884.202
x70        -3303.6544     99.019    -33.364      0.000   -3497.728   -3109.581
x71          678.0397     31.709     21.383      0.000     615.891     740.188
x72          449.4691     50.429      8.913      0.000     350.631     548.308
x73         1881.4959     33.873     55.546      0.000    1815.106    1947.886
x74          488.1976     34.729     14.057      0.000     420.130     556.266
x75         -818.2759     94.178     -8.689      0.000   -1002.861    -633.690
x76         -476.0159     78.144     -6.091      0.000    -629.176    -322.855
x77          369.1793     37.992      9.717      0.000     294.716     443.642
x78         -610.9179     49.224    -12.411      0.000    -707.395    -514.441
x79          217.0498     26.327      8.244      0.000     165.450     268.650
x80         -144.8580     24.612     -5.886      0.000    -193.097     -96.619
x81          475.4497     21.298     22.323      0.000     433.705     517.194
x82         1404.9458     27.294     51.474      0.000    1351.450    1458.442
x83          329.1859     49.154      6.697      0.000     232.846     425.526
==============================================================================
Omnibus:                    27530.062   Durbin-Watson:                   1.533
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            81968.349
Skew:                          -0.223   Prob(JB):                         0.00
Kurtosis:                       4.838   Cond. No.                         48.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

请让我知道,如果您需要任何其他信息。我尚未在sklearn和statmodels中共享确切的代码详细信息,因为我认为这可能会使问题陈述变得复杂,因此我愿意在必要时共享它。

python scikit-learn data-science data-modeling statsmodels
1个回答
1
投票

线性回归的基本形式在statsmodelsscikit-learn中相同。但是,实现方式有所不同,在极端情况下可能会产生不同的结果,并且scikit learning通常为更大的模型提供更多支持。例如,statsmodels当前很少使用稀疏矩阵。

最重要的区别是周围的基础结构和直接支持的用例。

Statsmodels在很大程度上遵循传统模型,我们想知道给定模型与数据的拟合程度,以及什么变量“ explain”或影响结果,或影响的大小。 Scikit-learn遵循机器学习的传统,其中主要的支持任务是选择“ best”模型进行预测。

因此,statsmodels支持功能的重点在于分析训练数据,其中包括假设检验和拟合优度度量,而scikit-learn支持基础设施的重点在于模型的选择,样本外预测,因此对“测试数据”进行交叉验证。

旁注:您的问题更适合https://stats.stackexchange.com

© www.soinside.com 2019 - 2024. All rights reserved.