对比统计模型中的效果

Question

考虑这个简单的例子

import pandas as pd
from statsmodels.formula.api import ols

url = "https://stats.idre.ucla.edu/stat/data/hsb2.csv"
hsb2 = pd.read_table(url, delimiter=",")
hsb2.head()
hsb2.head()
Out[4]: 
    id  female  race  ses  schtyp  prog  read  write  math  science  socst
0   70       0     4    1       1     1    57     52    41       47     57
1  121       1     4    2       1     3    68     59    53       63     61

我有两个兴趣分类变量（种族和女性），我想计算这些变量的每种可能组合的平均

read

分数的 t 统计数据。当然，我可以通过使用 statsmodels 中的

C()

符号对这些分类变量之间的完整交互进行回归

write

来间接获取此信息：

mod = ols("write ~ C(race)*C(female)", data=hsb2)
res = mod.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  write   R-squared:                       0.171
Model:                            OLS   Adj. R-squared:                  0.140
Method:                 Least Squares   F-statistic:                     5.642
Date:                Fri, 25 Aug 2023   Prob (F-statistic):           6.16e-06
Time:                        11:02:32   Log-Likelihood:                -714.39
No. Observations:                 200   AIC:                             1445.
Df Residuals:                     192   BIC:                             1471.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
===============================================================================================
                                  coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------------
Intercept                      44.3846      2.437     18.210      0.000      39.577      49.192
C(race)[T.2]                   11.2821      5.629      2.004      0.046       0.180      22.385
C(race)[T.3]                    2.6154      4.120      0.635      0.526      -5.511      10.742
C(race)[T.4]                    6.9095      2.660      2.597      0.010       1.663      12.156
C(female)[T.1]                  4.5245      3.600      1.257      0.210      -2.577      11.626
C(race)[T.2]:C(female)[T.1]    -1.3161      6.954     -0.189      0.850     -15.032      12.400
C(race)[T.3]:C(female)[T.1]    -2.6783      5.471     -0.490      0.625     -13.470       8.113
C(race)[T.4]:C(female)[T.1]     0.6749      3.886      0.174      0.862      -6.990       8.340
==============================================================================
Omnibus:                        6.095   Durbin-Watson:                   1.906
Prob(Omnibus):                  0.047   Jarque-Bera (JB):                5.710
Skew:                          -0.356   Prob(JB):                       0.0576
Kurtosis:                       2.578   Cond. No.                         23.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

但是，在回归输出中，我希望看到完整的对比效果（没有截距），即女性和种族每种组合的平均分数，而不必自己添加主要效果和交互。

我可以在 statsmodels 中执行此操作吗？

Answer 1

找到了。

删除拦截，仅回归交互。

mod = ols("write ~  0+C(race):C(female)", data=hsb2)
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  write   R-squared:                       0.171
Model:                            OLS   Adj. R-squared:                  0.140
Method:                 Least Squares   F-statistic:                     5.642
Date:                Sat, 26 Aug 2023   Prob (F-statistic):           6.16e-06
Time:                        11:31:36   Log-Likelihood:                -714.39
No. Observations:                 200   AIC:                             1445.
Df Residuals:                     192   BIC:                             1471.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
===========================================================================================
                              coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------
C(race)[1]:C(female)[0]    44.3846      2.437     18.210      0.000      39.577      49.192
C(race)[2]:C(female)[0]    55.6667      5.074     10.971      0.000      45.659      65.674
C(race)[3]:C(female)[0]    47.0000      3.322     14.150      0.000      40.448      53.552
C(race)[4]:C(female)[0]    51.2941      1.066     48.131      0.000      49.192      53.396
C(race)[1]:C(female)[1]    48.9091      2.650     18.458      0.000      43.683      54.135
C(race)[2]:C(female)[1]    58.8750      3.107     18.949      0.000      52.747      65.003
C(race)[3]:C(female)[1]    48.8462      2.437     20.040      0.000      44.039      53.654
C(race)[4]:C(female)[1]    56.4935      1.002     56.409      0.000      54.518      58.469
==============================================================================
Omnibus:                        6.095   Durbin-Watson:                   1.906
Prob(Omnibus):                  0.047   Jarque-Bera (JB):                5.710
Skew:                          -0.356   Prob(JB):                       0.0576
Kurtosis:                       2.578   Cond. No.                         5.07
==============================================================================

对比统计模型中的效果

问题描述投票：0回答：1

1个回答

最新问题

对比统计模型中的效果

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1