了解使用因子预测变量构建二项式 GLM 的过程

问题描述 投票:0回答:0

R 核心安装中包含的

esoph
数据集具有以下结构

head(esoph)
  agegp     alcgp    tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day      0        40
2 25-34 0-39g/day    10-19      0        10
3 25-34 0-39g/day    20-29      0         6
4 25-34 0-39g/day      30+      0         5
5 25-34     40-79 0-9g/day      0        27
6 25-34     40-79    10-19      0         7

str(esoph)
'data.frame':   88 obs. of  5 variables:
 $ agegp    : Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ alcgp    : Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ tobgp    : Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...
 $ ncases   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ncontrols: num  40 10 6 5 27 7 4 7 2 1 ...

尽管我设法构建了 2 个二项式 GLM 模型,其中包含如下有序/无序因子的预测因子

mod1 <- glm(cbind(ncases,ncontrols) ~ agegp + alcgp+ tobgp, esoph, family=binomial)

summary(mod1)    
Call:
glm(formula = cbind(ncases, ncontrols) ~ agegp + alcgp + tobgp, 
    family = binomial, data = esoph)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9507  -0.7376  -0.2438   0.6130   2.4127  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.19039    0.20737  -5.740 9.44e-09 ***
agegp.L      3.99663    0.69389   5.760 8.42e-09 ***
agegp.Q     -1.65741    0.62115  -2.668  0.00762 ** 
agegp.C      0.11094    0.46815   0.237  0.81267    
agegp^4      0.07892    0.32463   0.243  0.80792    
agegp^5     -0.26219    0.21337  -1.229  0.21915    
alcgp.L      2.53899    0.26385   9.623  < 2e-16 ***
alcgp.Q      0.09376    0.22419   0.418  0.67578    
alcgp.C      0.43930    0.18347   2.394  0.01665 *  
tobgp.L      1.11749    0.24014   4.653 3.26e-06 ***
tobgp.Q      0.34516    0.22414   1.540  0.12358    
tobgp.C      0.31692    0.21091   1.503  0.13294 

mod2 <- glm(cbind(ncases,ncontrols) ~ factor(agegp, ordered=F) + 
              factor(alcgp, ordered=F) + factor(tobgp, ordered=F),
            esoph, family=binomial)

summary(mod2)
Call:
glm(formula = cbind(ncases, ncontrols) ~ factor(agegp, ordered = F) + 
    factor(alcgp, ordered = F) + factor(tobgp, ordered = F), 
    family = binomial, data = esoph)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9507  -0.7376  -0.2438   0.6130   2.4127  

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                       -6.8954     1.0859  -6.350 2.16e-10 ***
factor(agegp, ordered = F)35-44    1.9809     1.1041   1.794 0.072786 .  
factor(agegp, ordered = F)45-54    3.7763     1.0680   3.536 0.000407 ***
factor(agegp, ordered = F)55-64    4.3352     1.0650   4.070 4.69e-05 ***
factor(agegp, ordered = F)65-74    4.8964     1.0764   4.549 5.39e-06 ***
factor(agegp, ordered = F)75+      4.8265     1.1213   4.304 1.67e-05 ***
factor(alcgp, ordered = F)40-79    1.4346     0.2501   5.737 9.63e-09 ***
factor(alcgp, ordered = F)80-119   1.9807     0.2848   6.956 3.51e-12 ***
factor(alcgp, ordered = F)120+     3.6029     0.3850   9.357  < 2e-16 ***
factor(tobgp, ordered = F)10-19    0.4381     0.2283   1.919 0.055039 .  
factor(tobgp, ordered = F)20-29    0.5126     0.2730   1.878 0.060398 .  
factor(tobgp, ordered = F)30+      1.6410     0.3441   4.769 1.85e-06 ***

我尝试在 Stackoverflow 上 google 和搜索,但仍然找不到以下问题的相关信息:

  1. 具有
    family=binomial
    的 GLM 如何接受 2 形式的响应变量
    cbind(ncases, ncontrols)
    模型实际上试图预测什么?
  2. L, Q, C, ^4, ^5
    输出的
    agegp.L, agegp.Q, ..., agegp^5
    中的字符
    summary(mod1)
    是什么意思?
  3. 为什么变量
    agegp
    alcgp
    tobgp
    的系数根据这些变量是有序/无序因子而不同?
r glm
© www.soinside.com 2019 - 2024. All rights reserved.