我一直在尝试在 R 中手动重现 Sargan 测试给出的结果,遗憾的是没有成功。
当我运行
ivreg()
然后输出 Sargan 测试统计数据时:
eitc <- read.dta13('education_earnings_v2.dta')
eitc$ln.wage <- log(eitc$wage)
TSLS <- ivreg(data = eitc, ln.wage ~ educ + exper + south + nonwhite
| nearc4 + nearc2 + exper + south + nonwhite)
summary(TSLS, diagnostics=TRUE)
我得到的 Sargan 统计值为 1.63。但是,当我尝试手动执行测试时:
surp_IV1 <- lm(educ ~ nearc2 + nearc4 + exper + south + nonwhite, data=eitc)
surp_IV_fit <- surp_IV1$fitted.values
surp_IV2 <- lm(ln.wage ~ surp_IV_fit + exper + south + nonwhite, data=eitc)
surp_resid <- resid(surp_IV2)
test_surplus <- lm(surp_resid ~ nearc2 + nearc4 + exper + south + nonwhite,
data = eitc)
summary(test_surplus)
对于 3,010 个观测值,R 平方 = 0.0008032,我得到的检验统计量为 2.42。
差异的原因是什么?
我想有些步骤是不必要的。
library(wooldridge) # data(card)
library(dplyr) # rename()
library(ivreg) # ivreg()
data(card)
eitc <- card |>
rename(nonwhite = black)
TSLS <- ivreg(lwage ~ educ + exper + south + nonwhite
| nearc4 + nearc2 + exper + south + nonwhite,
data = eitc)
summary(TSLS, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = lwage ~ educ + exper + south + nonwhite | nearc4 +
#> nearc2 + exper + south + nonwhite, data = eitc)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.309361 -0.319674 0.007403 0.334821 1.783133
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.17357 0.70494 3.083 0.00207 **
#> educ 0.24131 0.04089 5.901 4.02e-09 ***
#> exper 0.10453 0.01660 6.296 3.50e-10 ***
#> south -0.08534 0.02647 -3.224 0.00128 **
#> nonwhite -0.01541 0.04644 -0.332 0.74009
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 2 3004 19.682 3.22e-09 ***
#> Wu-Hausman 1 3004 27.580 1.61e-07 ***
#> Sargan 1 NA 1.626 0.202
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4998 on 3005 degrees of freedom
#> Multiple R-Squared: -0.2668, Adjusted R-squared: -0.2685
#> Wald test: 87.8 on 4 and 3005 DF, p-value: < 2.2e-16
TSLS_resid <- resid(TSLS)
surp_IV1 <- lm(TSLS_resid ~ nearc2 + nearc4 + exper + south + nonwhite,
data = eitc)
nobs(surp_IV1) * summary(surp_IV1)[["r.squared"]] # number of observations * R-squared
#> [1] 1.625811
创建于 2023-12-15,使用 reprex v2.0.2