具有多个预测变量的 Cox 回归;如何避免最后一个预测器出现 NA 结果

问题描述 投票:0回答:1

我在 R 中的 cox 模型的结果出现问题。每当我输入多个预测变量时,最后一个(按字母顺序)结果都是 NA,我不知道如何解决它。我尝试了各种变量(连续变量和分类变量)。我还更改了字母顺序以排除预测器的数据存在问题。

有人可以帮我吗?

提前致谢!

这是我的代码和结果:

cfit <- coxph(Surv(time, status) ~ a + b + c  , data=xdata) ; cfit

Call:
coxph(formula = Surv(time, status) ~ a + b + c, data = xdata)

coef exp(coef)  se(coef)      z        p

a -0.002962  0.997043  0.001423 -2.081   0.0374

b  0.004366  1.004376  0.000861  5.071 3.95e-07

c        NA        NA  0.000000     NA       NA

Likelihood ratio test=19.63  on 2 df, p=5.476e-05
n= 374, number of events= 374 

本示例中使用的所有数值变量都是以米为单位的深度测量值,数据集中没有 NA

a <- xdata$Depth_min  ; xdata <- cbind(xdata, a) ; xdata <- xdata[complete.cases(a), ] ; name_a <- "Minimum Depth [m]" ; rm(a) ; print(xdata$a)
b <- xdata$Depth_mean ; xdata <- cbind(xdata, b) ; xdata <- xdata[complete.cases(b), ] ; name_b <- "Mean Depth [m]"    ; rm(b) ; print(xdata$b)
c <- xdata$Depth_max  ; xdata <- cbind(xdata, c) ; xdata <- xdata[complete.cases(c), ] ; name_c <- "Maximum Depth [m]" ; rm(c) ; print(xdata$c)

数据和背景:我正在使用 SA 来探索物种的首次检测/发布中的差异。出版之年是我的大事(“死亡”)。通过考克斯模型,我试图找出哪些变量会影响物种的早期检测。我正在使用:Bodysize、Min/Mean/Max_Depth、Substrate、Life_habit、Professional Collector,

> dput(head(xdata))
structure(list(No. = c(260, 356, 1, 318, 256, 387), 
Class = c("Malacostraca", "Malacostraca", "Copepoda", "Malacostraca", "Malacostraca", "Malacostraca"), 
Order = c("Decapoda", "Decapoda", "Calanoidea", "Decapoda", 
"Decapoda", "Stomatopoda"), 
Suborder = c("Dendrobranchiata", "Pleocyemata", NA, "Pleocyemata", "Dendrobranchiata", "Unipeltata"), 
Infraorder = c(NA, "Caridea", NA, "Brachyura", NA, NA), 
Family = c("Luciferidae", "Palaemonidae", "Acartiidae", "Ocypodidae", "Penaeidae", "Squillidae"), 
Genus = c("Lucifer", "Palaemon", "Acartia (Acartia)", "Ocypode", "Parapenaeus", "Squilla"), 
Species = c("typus", "adspersus", "negligens", "cursor", "longirostris", "mantis"),   
Year_publ_first = c(1898, 1927, 1929, 1929, 1931, 1931), 
Linnean = c(1758, 1758, 1758, 1758, 1758, 1758), 
Survival_Time = c(140, 169, 171, 171, 173, 173), 
Survival_Status = c(1, 1, 1, 1, 1, 1), 
TS_comb = c(10, 21, 1.27, 35, 86, 165), 
Depth_min = c(45, 0, 0, 0, 1, NA), 
Depth_max = c(137, 60, 200, 0, 180, NA),
Depth_mean = c(91, 30, 100, 0, 90.5, NA),  
Substratum = c("pelagic", "soft", NA, "soft", "soft", NA), 
Life_habit = c("pelagic", "epifaunal", "pelagic", "epifaunal", "epifaunal", "epifaunal"),
Coll_Professional = c("yes", "yes", "yes", "yes", "yes", "yes"), 
`trophic level` = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
r na cox-regression
1个回答
0
投票

我怀疑您已经总结了原始数据,并且在原始数据中只有两次深度观察(对于每个物种?)。您的三个深度测量之间存在完美的共线性,并且您的模型参数化过多。因此,无论您最后添加到模型中的效果都将是不可估计的。鉴于您提供的(小)数据集,这是不可避免的。这个问题与术语名称的字母顺序无关。

为了证明共线性,请考虑:

xdata %>% 
  select(Depth_min, Depth_mean, Depth_max) %>% 
  mutate(Check = 2 * Depth_mean - (Depth_max + Depth_min))
# A tibble: 6 × 4
  Depth_min Depth_mean Depth_max Check
      <dbl>      <dbl>     <dbl> <dbl>
1        45       91         137     0
2         0       30          60     0
3         0      100         200     0
4         0        0           0     0
5         1       90.5       180     0
6        NA       NA          NA    NA
© www.soinside.com 2019 - 2024. All rights reserved.