MCMC 和零事件的二元逻辑回归

问题描述 投票:0回答:1

我的数据具有二进制结果和二进制因变量。 问题是几乎一个变量存在零事件。

我必须使用 OR (95% CI) 估计进行二元逻辑回归,但由于零事件而无法进行。

我在逻辑回归模型(

elrm
R 包)中尝试过类似的推理,但没有结果。

我该如何解决?

这是 elrm 的代码:

> x <- xtabs(~dead + interaction(volo_1, NACA_cat), data = db)
> x
     interaction(volo_1, NACA_cat)
morto  0.0  1.0  0.1  1.1
    0 2340  273  303   81
    1    0    0   86   21 

> cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_cat = rep(0:1, each = 2), admit = x[2, ], ntrials = colSums(x))
> cdat 
    volo_1 NACA_cat admit ntrials
0.0      0        0     0    2340
1.0      1        0     0     273
0.1      0        1    86     389
1.1      1        1    21     102
> m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
> ## summary of model including estimates and CIs
> summary(m.volo_1)

Call:

elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

       estimate p-value p-value_se mc_size
volo_1  0.63666 0.02295    0.00166   20000


95% Confidence Intervals for Parameters

            lower    upper
volo_1 0.06414017 1.352148
> 
> 
> 
> m.NACA_cat <- elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 4 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_cat' observed value of the sufficient statistic was not sampled 
> ## summary of model including estimates and CIs
> summary(m.NACA_cat)

Call:

elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

         estimate p-value p-value_se mc_size
NACA_cat       NA       0          0   20000


95% Confidence Intervals for Parameters

         lower upper
NACA_cat    NA    NA

这是二元逻辑回归的代码:

> full.model <- glm(morto ~ volo_1  + NACA_cat , data = db,family=binomial())
> logistic.display(full.model)

Logistic regression predicting morto : 1 vs 0 
 
                 crude OR(95%CI)       adj. OR(95%CI)        P(Wald's test) P(LR-test)
volo_1: 1 vs 0   1.82 (1.12,2.98)      0.91 (0.53,1.56)      0.741          0.739     
                                                                                      
NACA_cat: 1 vs 0 647257526.44 (0,Inf)  653272824.33 (0,Inf)  0.972          < 0.001   
                                                                                      
Log-likelihood = -257.3593
No. of observations = 3104
AIC value = 520.7187

编辑: 这是一个记录较少但问题相同的 MRE:

db<- data.frame( morto =  c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                 volo_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                 NACA_dic = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))



db$volo_1<- as.factor(db$volo_1)


db$morto<- as.factor(db$morto)


db$NACA_dic<- as.factor(db$NACA_dic)



full.model <- glm(morto ~ volo_1 + NACA_dic  , data = db,family=binomial())

logistic.display(full.model)

x <- xtabs(~morto + interaction(volo_1, NACA_dic), data = db)


cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_dic = rep(0:1, each = 2), 
                           admit = x[2, ], ntrials = colSums(x))
cdat 


m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, 
                 dataset = cdat, burnIn = 2000)
summary(m.volo_1)

m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000, 
                   dataset = cdat, burnIn = 2000)
summary(m.NACA_cat)

问题来了:

> m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000, 
+                    dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_dic' observed value of the sufficient statistic was not sampled 
> summary(m.NACA_cat)

Call:

elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

         estimate p-value p-value_se mc_size
NACA_cat       NA       0          0   20000


95% Confidence Intervals for Parameters

         lower upper
NACA_cat    NA    NA
r regression
1个回答
0
投票

已解决:

按照@BenBolker的建议,我使用

brglm
包实现了二项式响应广义线性模型中的偏差减少

> model<-brglm(morto ~ volo_1 + NACA_dic , data = db,family=binomial, model = TRUE, method = "brglm.fit",
+              pl = FALSE, x = FALSE, y = TRUE, contrasts = NULL)
> 
> summary(model)

Call:
brglm(formula = morto ~ volo_1 + NACA_dic, family = binomial, 
    data = db, model = TRUE, method = "brglm.fit", pl = FALSE, 
    x = FALSE, y = TRUE, contrasts = NULL)


Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -4.5591     1.4144  -3.223  0.00127 **
volo_11      -1.3200     0.7845  -1.683  0.09244 . 
NACA_dic1     4.6583     1.4325   3.252  0.00115 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.942  on 133  degrees of freedom
Residual deviance:  92.711  on 131  degrees of freedom
Penalized deviance: 90.11255 
AIC:  98.711 

> exp(coef(model))
 (Intercept)      volo_11    NACA_dic1 
  0.01047146   0.26712352 105.45428986 
> exp(confint(model))
Profiling the ordinary deviance for the corresponding ML fit...
Profiling the penalized deviance for the supplied fit...
Calculating confidence intervals for the ML fit using deviance profiles...
Calculating confidence intervals for the BR fit using penalized likelihood profiles...
                  2.5 %     97.5 %
(Intercept)  0.00000000 0.07274017
volo_11      0.03224977 1.06917625
NACA_dic1   14.10469036        Inf
> coef(summary(model))[,'Pr(>|z|)']
(Intercept)     volo_11   NACA_dic1 
0.001266712 0.092437418 0.001146851 

© www.soinside.com 2019 - 2024. All rights reserved.