我正在尝试通过 coef 打印 VIF(方差膨胀因子)。但是,我似乎无法从 statsmodels 中找到任何说明如何进行的文档?我有一个需要处理的 n 个变量的模型,并且所有变量的多重共线性值无助于删除共线性最高的值。
这看起来像是一个答案
但是我将如何针对本工作簿运行它。
http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
下面是代码和摘要输出,这也是我现在所在的位置。
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# read data into a DataFrame
data = pd.read_csv('somepath', index_col=0)
print(data.head())
#multiregression
lm = smf.ols(formula='Sales ~ TV + Radio + Newspaper', data=data).fit()
print(lm.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.897
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 570.3
Date: Wed, 15 Feb 2017 Prob (F-statistic): 1.58e-96
Time: 13:28:29 Log-Likelihood: -386.18
No. Observations: 200 AIC: 780.4
Df Residuals: 196 BIC: 793.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 2.9389 0.312 9.422 0.000 2.324 3.554
TV 0.0458 0.001 32.809 0.000 0.043 0.049
Radio 0.1885 0.009 21.893 0.000 0.172 0.206
Newspaper -0.0010 0.006 -0.177 0.860 -0.013 0.011
==============================================================================
Omnibus: 60.414 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241
Skew: -1.327 Prob(JB): 1.44e-33
Kurtosis: 6.332 Cond. No. 454.
==============================================================================
获取 VIF 列表:
from statsmodels.stats.outliers_influence import variance_inflation_factor
variables = lm.model.exog
vif = [variance_inflation_factor(variables, i) for i in range(variables.shape[1])]
vif
了解他们的意思:
np.array(vif).mean()
一种更好的格式,其中包含数据帧中每个 VIF 对应的 exog 变量名称。顶部可能存在问题的高度共线性变量
vifs = pd.DataFrame({'variables':lm.model.exog_names,'vif':[ '%.2f' % elem for elem in vif ]})
vifs.sort_values(by='vif',ascending=False)