R 核心安装中包含的
esoph
数据集具有以下结构
head(esoph)
agegp alcgp tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day 0 40
2 25-34 0-39g/day 10-19 0 10
3 25-34 0-39g/day 20-29 0 6
4 25-34 0-39g/day 30+ 0 5
5 25-34 40-79 0-9g/day 0 27
6 25-34 40-79 10-19 0 7
str(esoph)
'data.frame': 88 obs. of 5 variables:
$ agegp : Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ alcgp : Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
$ tobgp : Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...
$ ncases : num 0 0 0 0 0 0 0 0 0 0 ...
$ ncontrols: num 40 10 6 5 27 7 4 7 2 1 ...
尽管我设法构建了 2 个二项式 GLM 模型,其中包含如下有序/无序因子的预测因子
mod1 <- glm(cbind(ncases,ncontrols) ~ agegp + alcgp+ tobgp, esoph, family=binomial)
summary(mod1)
Call:
glm(formula = cbind(ncases, ncontrols) ~ agegp + alcgp + tobgp,
family = binomial, data = esoph)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9507 -0.7376 -0.2438 0.6130 2.4127
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.19039 0.20737 -5.740 9.44e-09 ***
agegp.L 3.99663 0.69389 5.760 8.42e-09 ***
agegp.Q -1.65741 0.62115 -2.668 0.00762 **
agegp.C 0.11094 0.46815 0.237 0.81267
agegp^4 0.07892 0.32463 0.243 0.80792
agegp^5 -0.26219 0.21337 -1.229 0.21915
alcgp.L 2.53899 0.26385 9.623 < 2e-16 ***
alcgp.Q 0.09376 0.22419 0.418 0.67578
alcgp.C 0.43930 0.18347 2.394 0.01665 *
tobgp.L 1.11749 0.24014 4.653 3.26e-06 ***
tobgp.Q 0.34516 0.22414 1.540 0.12358
tobgp.C 0.31692 0.21091 1.503 0.13294
mod2 <- glm(cbind(ncases,ncontrols) ~ factor(agegp, ordered=F) +
factor(alcgp, ordered=F) + factor(tobgp, ordered=F),
esoph, family=binomial)
summary(mod2)
Call:
glm(formula = cbind(ncases, ncontrols) ~ factor(agegp, ordered = F) +
factor(alcgp, ordered = F) + factor(tobgp, ordered = F),
family = binomial, data = esoph)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9507 -0.7376 -0.2438 0.6130 2.4127
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8954 1.0859 -6.350 2.16e-10 ***
factor(agegp, ordered = F)35-44 1.9809 1.1041 1.794 0.072786 .
factor(agegp, ordered = F)45-54 3.7763 1.0680 3.536 0.000407 ***
factor(agegp, ordered = F)55-64 4.3352 1.0650 4.070 4.69e-05 ***
factor(agegp, ordered = F)65-74 4.8964 1.0764 4.549 5.39e-06 ***
factor(agegp, ordered = F)75+ 4.8265 1.1213 4.304 1.67e-05 ***
factor(alcgp, ordered = F)40-79 1.4346 0.2501 5.737 9.63e-09 ***
factor(alcgp, ordered = F)80-119 1.9807 0.2848 6.956 3.51e-12 ***
factor(alcgp, ordered = F)120+ 3.6029 0.3850 9.357 < 2e-16 ***
factor(tobgp, ordered = F)10-19 0.4381 0.2283 1.919 0.055039 .
factor(tobgp, ordered = F)20-29 0.5126 0.2730 1.878 0.060398 .
factor(tobgp, ordered = F)30+ 1.6410 0.3441 4.769 1.85e-06 ***
我尝试在 Stackoverflow 上 google 和搜索,但仍然找不到以下问题的相关信息:
family=binomial
的 GLM 如何接受 2 形式的响应变量 cbind(ncases, ncontrols)
模型实际上试图预测什么?L, Q, C, ^4, ^5
输出的agegp.L, agegp.Q, ..., agegp^5
中的字符summary(mod1)
是什么意思?agegp
、alcgp
和 tobgp
的系数根据这些变量是有序/无序因子而不同?