OLS 回归结果 Python 中的 coef 的 VIF

问题描述 投票:0回答:2

我正在尝试通过 coef 打印 VIF(方差膨胀因子)。但是,我似乎无法从 statsmodels 中找到任何说明如何进行的文档?我有一个需要处理的 n 个变量的模型,并且所有变量的多重共线性值无助于删除共线性最高的值。

这看起来像是一个答案

https://stats.stackexchange.com/questions/155028/how-to-systematically-remove-collinear-variables-in-python

但是我将如何针对本工作簿运行它。

http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv

下面是代码和摘要输出,这也是我现在所在的位置。

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

# read data into a DataFrame
data = pd.read_csv('somepath', index_col=0)
print(data.head())

#multiregression
lm = smf.ols(formula='Sales ~ TV + Radio + Newspaper', data=data).fit()
print(lm.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.897
Model:                            OLS   Adj. R-squared:                  0.896
Method:                 Least Squares   F-statistic:                     570.3
Date:                Wed, 15 Feb 2017   Prob (F-statistic):           1.58e-96
Time:                        13:28:29   Log-Likelihood:                -386.18
No. Observations:                 200   AIC:                             780.4
Df Residuals:                     196   BIC:                             793.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.9389      0.312      9.422      0.000         2.324     3.554
TV             0.0458      0.001     32.809      0.000         0.043     0.049
Radio          0.1885      0.009     21.893      0.000         0.172     0.206
Newspaper     -0.0010      0.006     -0.177      0.860        -0.013     0.011
==============================================================================
Omnibus:                       60.414   Durbin-Watson:                   2.084
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              151.241
Skew:                          -1.327   Prob(JB):                     1.44e-33
Kurtosis:                       6.332   Cond. No.                         454.
==============================================================================
python linear-regression data-science
2个回答
7
投票

获取 VIF 列表:

from statsmodels.stats.outliers_influence import variance_inflation_factor

variables = lm.model.exog
vif = [variance_inflation_factor(variables, i) for i in range(variables.shape[1])]
vif 

了解他们的意思:

np.array(vif).mean()

0
投票

一种更好的格式,其中包含数据帧中每个 VIF 对应的 exog 变量名称。顶部可能存在问题的高度共线性变量

vifs = pd.DataFrame({'variables':lm.model.exog_names,'vif':[ '%.2f' % elem for elem in vif ]})
vifs.sort_values(by='vif',ascending=False)
© www.soinside.com 2019 - 2024. All rights reserved.