假设我有兴趣预测泰坦尼克号沉没后不同社会阶层的生还概率。由于这不是随机对照试验,各组不太可能保持平衡,因此我还想考虑因性别和年龄而导致的生存差异。我运行了一个 Logit 模型,但我发现沟通比值比很困难,而风险差异和风险比率对于外行人来说更容易理解。我的问题是:
library(tidyverse)
## Load Titanic library to get the dataset
library(titanic)
library(marginaleffects)
## Load the datasets
data("titanic_train")
data("titanic_test")
## Setting Survived column for test data to NA
titanic_test$Survived <- NA
## Combining Training and Testing dataset
complete_data <- rbind(titanic_train, titanic_test) %>%
filter(!Survived == "Unknown") %>%
mutate(Pclass = factor(Pclass),
Sex = factor(Sex),
Survived = factor(Survived))
lrm_unadj <- glm(Survived ~ Pclass,
data = complete_data, family = binomial(link = "logit"))
lrm_adj <- glm(Survived ~ Pclass + Sex + Age,
data = complete_data, family = binomial(link = "logit"))
# Risk difference (difference in probabilities) - should this be similar to LPM?
avg_comparisons(lrm_unadj, variables = "Pclass")
avg_comparisons(lrm_adj, variables = "Pclass")
lm(as.numeric(Survived) ~ Pclass, data = complete_data)%>%
lmtest::coeftest(., vcov = sandwich::vcovHC, type = "HC3") %>%
broom::tidy(conf.int = T)
lm(as.numeric(Survived) ~ Pclass + Sex + Age, data = complete_data)%>%
lmtest::coeftest(., vcov = sandwich::vcovHC, type = "HC3") %>%
broom::tidy(conf.int = T)
我强烈建议您阅读此页面,它使用相同的数据详细解释了所有这些:https://marginaleffects.com/vignettes/comparisons.html
风险差异是否与概率差异相同,进而与线性概率模型的系数(即概率变化)大致相同?
是的。
边际效应如何计算...
有两种基本方法。首先,你可以这样做:
PClass
固定为“1st”,并计算每行的预测概率。PClass
固定为“2nd”并计算每行的预测概率。这就是这样的代码的作用:
avg_comparisons(fit)
第二个策略是:
SexCode
=1、Age
=25。PClass
="1st"PClass
="2nd"这就是这样的代码的作用:
comparisons(fit, newdata = datagrid(SexCode=1, Age=25))
再次请参阅文档。这一切都进行了非常详细的解释,并有许多手动计算的示例: