import numpy as np
from statsmodels.regression.linear_model import OLS
np.random.seed(44)
n_samples, n_features = 50, 4
X = np.random.randn(n_samples, n_features)
coef=np.random.uniform(-12,12,4)
y = np.dot(X, coef)
var = 400
y += var**(1/2) * np.random.normal(size=n_samples)
regr=OLS(y, X).fit()
print(regr.params)
print(regr.summary())
sms.linear_harvey_collier(regr)
我得到结果Ttest_1sampResult(statistic = nan,pvalue = nan)。[如果我在排除一个变量的同时执行测试,则会得到结果:
X3=X[:,:3] regr3=OLS(y, X3).fit() In [1]: sms.linear_harvey_collier(regr3) Out[2]: Ttest_1sampResult(statistic=0.2447803429683807, pvalue=0.806727747845282)
不添加常量和截距是否有问题?这只是一种感觉,如果确实存在问题,我不明白为什么。
linear_harvey_collier
只有两行代码。解决方法是直接计算测试]
res = regr
from scipy import stats
skip = len(res.params) # bug in linear_harvey_collier
rr = sms.recursive_olsresiduals(res, skip=skip, alpha=0.95, order_by=None)
stats.ttest_1samp(rr[3][skip:], 0)
Ttest_1sampResult(statistic=0.03092937323130299, pvalue=0.9754626388210277)