考虑这个简单的例子
import pandas as pd
from statsmodels.formula.api import ols
url = "https://stats.idre.ucla.edu/stat/data/hsb2.csv"
hsb2 = pd.read_table(url, delimiter=",")
hsb2.head()
hsb2.head()
Out[4]:
id female race ses schtyp prog read write math science socst
0 70 0 4 1 1 1 57 52 41 47 57
1 121 1 4 2 1 3 68 59 53 63 61
我有两个兴趣分类变量(种族和女性),我想计算这些变量的每种可能组合的平均
read
分数的 t 统计数据。当然,我可以通过使用 statsmodels 中的 C()
符号对这些分类变量之间的完整交互进行回归 write
来间接获取此信息:
mod = ols("write ~ C(race)*C(female)", data=hsb2)
res = mod.fit()
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: write R-squared: 0.171
Model: OLS Adj. R-squared: 0.140
Method: Least Squares F-statistic: 5.642
Date: Fri, 25 Aug 2023 Prob (F-statistic): 6.16e-06
Time: 11:02:32 Log-Likelihood: -714.39
No. Observations: 200 AIC: 1445.
Df Residuals: 192 BIC: 1471.
Df Model: 7
Covariance Type: nonrobust
===============================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------
Intercept 44.3846 2.437 18.210 0.000 39.577 49.192
C(race)[T.2] 11.2821 5.629 2.004 0.046 0.180 22.385
C(race)[T.3] 2.6154 4.120 0.635 0.526 -5.511 10.742
C(race)[T.4] 6.9095 2.660 2.597 0.010 1.663 12.156
C(female)[T.1] 4.5245 3.600 1.257 0.210 -2.577 11.626
C(race)[T.2]:C(female)[T.1] -1.3161 6.954 -0.189 0.850 -15.032 12.400
C(race)[T.3]:C(female)[T.1] -2.6783 5.471 -0.490 0.625 -13.470 8.113
C(race)[T.4]:C(female)[T.1] 0.6749 3.886 0.174 0.862 -6.990 8.340
==============================================================================
Omnibus: 6.095 Durbin-Watson: 1.906
Prob(Omnibus): 0.047 Jarque-Bera (JB): 5.710
Skew: -0.356 Prob(JB): 0.0576
Kurtosis: 2.578 Cond. No. 23.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
但是,在回归输出中,我希望看到完整的对比效果(没有截距),即女性和种族每种组合的平均分数,而不必自己添加主要效果和交互。
我可以在 statsmodels 中执行此操作吗?
找到了。
删除拦截,仅回归交互。
mod = ols("write ~ 0+C(race):C(female)", data=hsb2)
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: write R-squared: 0.171
Model: OLS Adj. R-squared: 0.140
Method: Least Squares F-statistic: 5.642
Date: Sat, 26 Aug 2023 Prob (F-statistic): 6.16e-06
Time: 11:31:36 Log-Likelihood: -714.39
No. Observations: 200 AIC: 1445.
Df Residuals: 192 BIC: 1471.
Df Model: 7
Covariance Type: nonrobust
===========================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------------
C(race)[1]:C(female)[0] 44.3846 2.437 18.210 0.000 39.577 49.192
C(race)[2]:C(female)[0] 55.6667 5.074 10.971 0.000 45.659 65.674
C(race)[3]:C(female)[0] 47.0000 3.322 14.150 0.000 40.448 53.552
C(race)[4]:C(female)[0] 51.2941 1.066 48.131 0.000 49.192 53.396
C(race)[1]:C(female)[1] 48.9091 2.650 18.458 0.000 43.683 54.135
C(race)[2]:C(female)[1] 58.8750 3.107 18.949 0.000 52.747 65.003
C(race)[3]:C(female)[1] 48.8462 2.437 20.040 0.000 44.039 53.654
C(race)[4]:C(female)[1] 56.4935 1.002 56.409 0.000 54.518 58.469
==============================================================================
Omnibus: 6.095 Durbin-Watson: 1.906
Prob(Omnibus): 0.047 Jarque-Bera (JB): 5.710
Skew: -0.356 Prob(JB): 0.0576
Kurtosis: 2.578 Cond. No. 5.07
==============================================================================