我正在对树冠覆盖率(OverheadCover
,以0.1为界的比例)和放置在同一位置的尸体数量(CarcassNumber
,具有2个水平的因子)对比例的影响进行r分析鸟吃掉的腐肉的比例(ProportionBirdsScavenging
,比例以0,1为界)。我通过为OverheadCover
的各个值建模ProportionBirdsScavenging
对CarcassNumber
的影响来绘制此交互关系,然后将其绘制在同一张图中。完成此操作后,我看到了由plot_model(glmm_interaction, type = "int")
计算出的SE与我计算出的SE的差异。在这里开始调查。这段旅程使我了解了plot_model
,plot_type_int
,ggpredict
的源代码,并以ggpredict_helper
停下来。不幸的是,我确实没有找到SE的计算,但是我找到了SE的差异的证明。我一直很自信自己可以正确计算出SE,但是现在我不太确定了。请在下面查看我的代码。
#rm(list = ls())
library(glmmTMB)
library(dplyr)
data_both <- data.frame(ProportionBirdsScavenging = c(0.406192519926425, 0.871428571428571, 0.452995391705069, 0.484821428571429, 0.795866569978245, 0.985714285714286, 0.208571428571429, 0.573982970671712, 0.694285714285714, 0.930204081632653, 0.0483709273182957, 0.0142857142857143, 0.661904761904762, 0.985714285714286, 0.0142857142857143, 0.0142857142857143),
pointWeight = c(233, 17, 341, 128, 394, 46, 5, 302, 10, 35, 57, 39, 12, 229, 28, 116),
OverheadCover = c(0.671, 0.04, 0.46, 0.65, 0.02, 0, 0.8975, 0.585, 0.6795, 0.0418, 0.5995, 0.6545, 0.02, 0, 0.92, 0.585),
CarcassNumber = as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)),
Area = c("Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst", "Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst"))
data_both$pointWeight_scaled <- scales::rescale(data_both$pointWeight, to = c(0.0001,1)) # rescale weights
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
ggeffects::ggpredict(glmm_interaction, terms = c("OverheadCover", "CarcassNumber [1:2]")) # SE's calculated by r
# Calculate the SE's for CarcassNumber 1
df_first_carcasses <- filter(data_both, CarcassNumber == 1) # create df with only first carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_first_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1) # pad new.xglmm with an arbitrary value for Area and pointWeight_scaled, then exclude them in predict, otherwise error -> https://stackoverflow.com/questions/54411851/mgcv-how-to-use-exclude-argument-in-predict-gam
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>% # exclude Area and pointWeight_scaled from the prediction
mutate(ProportionBirdsScavenging = plogis(fit)) %>% # calculate the ProportionBirdsScavenging on response scale
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm1 <- mutate(data.frame(new.xglmm, new.yglmm), # calculate the lwr and upr bounds using the untransformed predictions and SE, then transformed to probability
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm1[c(1,45,501,653,707,1000),c(1,6,5)] # compare my SE's to the ggpredict SE's
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.82 | 0.44 | > 0.43
# 0.04 | 0.80 | 0.41 | > 0.40
# 0.46 | 0.56 | 0.22 | > 0.21
# 0.60 | 0.47 | 0.26 | > 0.25
# 0.65 | 0.44 | 0.29 | > 0.28
# 0.92 | 0.28 | 0.47 | > 0.46
# ( CarcassNumber = 1 )
# Same for CarcassNumber 2
df_second_carcasses <- filter(data_both, CarcassNumber == 2) # create df with only second carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_second_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1)
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>%
mutate(ProportionBirdsScavenging = plogis(fit)) %>%
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm2 <- mutate(data.frame(new.xglmm, new.yglmm),
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm2[c(1,45,501,653,707,1000),c(1,6,5)]
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.96 | 1.01 | > 1.21
# 0.04 | 0.95 | 0.94 | > 1.10
# 0.46 | 0.16 | 0.86 | > 0.96
# 0.60 | 0.04 | 1.11 | > 1.32
# 0.65 | 0.02 | 1.22 | > 1.46
# 0.92 | 0.00 | 1.83 | > 2.31
# ( CarcassNumber = 2 )
CarcassNumber
的差异最明显。2.我的计算不正确吗?我怀疑这种差异可能与pointWeight_scaled
函数中predict
的包含和排除有关。如果我不这样做,它将返回一个错误,提示Error in eval(extras, data, env) : object 'pointWeight_scaled' not found
。这是个常见的问题吗?我在这里读到,包含然后排除它可以解决问题mgcv: How to use 'exclude' argument in predict.gam?。这不是正确的方法吗?
我希望有人可以阐明这个问题。
hm,当我运行您的示例时,得到的结果与您期望的相同: