我在 R 中的 cox 模型的结果出现问题。每当我输入多个预测变量时,最后一个(按字母顺序)结果都是 NA,我不知道如何解决它。我尝试了各种变量(连续变量和分类变量)。我还更改了字母顺序以排除预测器的数据存在问题。
有人可以帮我吗?
提前致谢!
这是我的代码和结果:
cfit <- coxph(Surv(time, status) ~ a + b + c , data=xdata) ; cfit
Call:
coxph(formula = Surv(time, status) ~ a + b + c, data = xdata)
coef exp(coef) se(coef) z p
a -0.002962 0.997043 0.001423 -2.081 0.0374
b 0.004366 1.004376 0.000861 5.071 3.95e-07
c NA NA 0.000000 NA NA
Likelihood ratio test=19.63 on 2 df, p=5.476e-05
n= 374, number of events= 374
本示例中使用的所有数值变量都是以米为单位的深度测量值,数据集中没有 NA
a <- xdata$Depth_min ; xdata <- cbind(xdata, a) ; xdata <- xdata[complete.cases(a), ] ; name_a <- "Minimum Depth [m]" ; rm(a) ; print(xdata$a)
b <- xdata$Depth_mean ; xdata <- cbind(xdata, b) ; xdata <- xdata[complete.cases(b), ] ; name_b <- "Mean Depth [m]" ; rm(b) ; print(xdata$b)
c <- xdata$Depth_max ; xdata <- cbind(xdata, c) ; xdata <- xdata[complete.cases(c), ] ; name_c <- "Maximum Depth [m]" ; rm(c) ; print(xdata$c)
数据和背景:我正在使用 SA 来探索物种的首次检测/发布中的差异。出版之年是我的大事(“死亡”)。通过考克斯模型,我试图找出哪些变量会影响物种的早期检测。我正在使用:Bodysize、Min/Mean/Max_Depth、Substrate、Life_habit、Professional Collector,
> dput(head(xdata))
structure(list(No. = c(260, 356, 1, 318, 256, 387),
Class = c("Malacostraca", "Malacostraca", "Copepoda", "Malacostraca", "Malacostraca", "Malacostraca"),
Order = c("Decapoda", "Decapoda", "Calanoidea", "Decapoda",
"Decapoda", "Stomatopoda"),
Suborder = c("Dendrobranchiata", "Pleocyemata", NA, "Pleocyemata", "Dendrobranchiata", "Unipeltata"),
Infraorder = c(NA, "Caridea", NA, "Brachyura", NA, NA),
Family = c("Luciferidae", "Palaemonidae", "Acartiidae", "Ocypodidae", "Penaeidae", "Squillidae"),
Genus = c("Lucifer", "Palaemon", "Acartia (Acartia)", "Ocypode", "Parapenaeus", "Squilla"),
Species = c("typus", "adspersus", "negligens", "cursor", "longirostris", "mantis"),
Year_publ_first = c(1898, 1927, 1929, 1929, 1931, 1931),
Linnean = c(1758, 1758, 1758, 1758, 1758, 1758),
Survival_Time = c(140, 169, 171, 171, 173, 173),
Survival_Status = c(1, 1, 1, 1, 1, 1),
TS_comb = c(10, 21, 1.27, 35, 86, 165),
Depth_min = c(45, 0, 0, 0, 1, NA),
Depth_max = c(137, 60, 200, 0, 180, NA),
Depth_mean = c(91, 30, 100, 0, 90.5, NA),
Substratum = c("pelagic", "soft", NA, "soft", "soft", NA),
Life_habit = c("pelagic", "epifaunal", "pelagic", "epifaunal", "epifaunal", "epifaunal"),
Coll_Professional = c("yes", "yes", "yes", "yes", "yes", "yes"),
`trophic level` = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
我怀疑您已经总结了原始数据,并且在原始数据中只有两次深度观察(对于每个物种?)。您的三个深度测量之间存在完美的共线性,并且您的模型参数化过多。因此,无论您最后添加到模型中的效果都将是不可估计的。鉴于您提供的(小)数据集,这是不可避免的。这个问题与术语名称的字母顺序无关。
为了证明共线性,请考虑:
xdata %>%
select(Depth_min, Depth_mean, Depth_max) %>%
mutate(Check = 2 * Depth_mean - (Depth_max + Depth_min))
# A tibble: 6 × 4
Depth_min Depth_mean Depth_max Check
<dbl> <dbl> <dbl> <dbl>
1 45 91 137 0
2 0 30 60 0
3 0 100 200 0
4 0 0 0 0
5 1 90.5 180 0
6 NA NA NA NA