混合效应模型的 ggpredict() 和 ggemmeans() 计算的预测不同：为什么？

Question

我使用

ggpredict()

1.3.0 包中的函数

ggemmeans()

和

ggeffects

来计算混合效应模型的平均估计值和置信区间（以下简称：CI）。这些函数依赖于 Predict() 和 emmeans() 并使其输出适合 ggplot。这两个函数预测/估计的值的平均值和 CI 均不同。为什么？

以下可重现的示例基于数据集 RIKZ（Janssen e Mulder 2005；Zuur 等人 2007），该数据集着眼于与平均潮汐水位（NAP，米）和暴露程度（低、中、高三个级别的因子）：

rm(list=ls())
if (!require(pacman)) install.packages('pacman'); library(pacman)
p_load(emmeans)
p_load(ggplot2)
p_load(ggeffects)
p_load(lme4, lmerTest, glmmTMB)
# get data:
RIKZ <- read.csv(text = RCurl::getURL(
"https://raw.githubusercontent.com/marcoplebani85/datasets/master/RIKZ.csv"))
str(RIKZ)
# "Exposure" is a factor:
RIKZ$Exposure <- as.factor(RIKZ$Exposure)

在这里，我使用

glmmTMB()

：

将具有泊松分布残差的广义混合效应模型拟合到数据中

mem1 <- glmmTMB(Richness ~ NAP+Exposure + (1 | Beach),
                family="poisson",
                data = RIKZ, REML=T)

模型预测和 CI 根据

ggeffects::ggpredict()

，不考虑随机效应的不确定性（请参阅本页了解为何考虑或不考虑）：

richness.predicted <- ggpredict(mem1, 
terms=c("NAP", "Exposure"), type="fixed")

根据 ggeffects::ggemmeans()

同一模型的预测和 CI，不考虑随机效应的不确定性：

richness.emmeans <- ggemmeans(mem1, 
terms=c("NAP", "Exposure"), type="fixed")

代表两组模型预测和 CI 以及观测数据的图表：

p1 <- plot(richness.predicted, add.data=T) + 
    labs(title="Predictions by ggpredict()") + ylim(0,45) +
    theme(text = element_text(size = 15))
p2 <- plot(richness.emmeans, add.data=T) + 
    labs(title="Predictions by ggemmeans()") + ylim(0,45) +
    theme(text = element_text(size = 15))
ggarrange(p1, p2, ncol=2, labels=c("(a)", "(b)"), 
    common.legend=T, legend="bottom")

为什么两组均值估计值和置信区间不同？

更新：

本页解释了两个函数之间的差异：

[...] ggpredict() 返回的效果[是]条件效果（即这些以某些（参考）因素水平为条件），而 ggemeans() 返回边际均值，因为效果是在因素水平上“边缘化”（或“平均”）。然而，这些差异仅适用于非焦点术语，即剩余 terms 参数中未指定的变量。

继续：

当所有分类预测变量都用术语指定并进一步（非焦点）项只是数字，结果也是相同的（如 ggpredict() 和 ggemeans() 都默认使用平均值来保存非焦点数值变量常量）。

我怀疑混合效应模型的两个函数获得的预测存在差异的原因在于它们如何处理随机效应，但我不知道如何处理。

参考文献：

Janssen, G. 和 Mulder, S. (2005)，荷兰海岸沿线沙滩和冲浪区大型动物的分区。 Oceanologia 47(2): 265-282。
Zuur, A.F.、Ieno, E.N. 和 Smith, G.M. (2007)，分析生态数据。Springer Science & Business Media
https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html
https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html

Answer 1

这确实是一条评论，不是完整的答案，但也许它可以指出正确的方向来理解

ggpredict

和

ggemmeans

之间的微妙差异，这实际上是

predict.glmmTMB

和

emmeans

之间的差异.

glmmTMB

使用两种估计方法：ML（最大似然）和REML（限制最大似然）。通过适当设置参数

REML

来选择一种或另一种方法。

如果估计方法是 ML，则

ggpredict

和

ggemeans

给出相同的结果。当估计方法为 REML 并且模型具有随机效应时，会出现差异。

fit_glmmTMB <- glmmTMB(
  Richness ~ NAP + Exposure + (1 | Beach),
  family = poisson,
  data = RIKZ,
  REML = FALSE # Use ML to fit the model.
)

ggpredict(
  fit_glmmTMB, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.35]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#>
#> [The rest of the output is cut.]

ggemmeans(
  fit_glmmTMB, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.35]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#> 
#> [The rest of the output is cut.]

为了进行比较，我还使用

lme4::glmer

安装了 Poisson GLMM。此函数没有

method

选项并使用 ML。再说一次，没有区别。

fit_glmer <- glmer(
  Richness ~ NAP + Exposure + (1 | Beach),
  family = poisson,
  data = RIKZ
)

ggpredict(
  fit_glmer, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.34]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#> 
#> [The rest of the output is cut.]

ggemmeans(
  fit_glmer, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.34]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#>
#> [The rest of the output is cut.]

混合效应模型的 ggpredict() 和 ggemmeans() 计算的预测不同：为什么？

问题描述投票：0回答：1

1个回答

最新问题

混合效应模型的 ggpredict() 和 ggemmeans() 计算的预测不同：为什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1