predict() 函数产生的值与 glmer 中的手动计算不同

Question

我正在尝试从 glmer 模型获取数据源的预测概率（对于源 A 和源 B，编码为 0 或 1）。使用示例数据：

set.seed(123)
n<-7052
Df <- data.frame(
  source = sample(c(0, 1), n, replace = TRUE, 
      prob = c(0.719, 0.221)),  
  Response.number = sample(1:20, n, replace = TRUE),  
  Item.number = sample(1:40, n, replace = TRUE), 
  Ps.number = sample(1:40, n, replace = TRUE)  
)


Model1 <- glmer(source ~  (1|Response.number/Item.number) +
    (1|Ps.number), 
     data=Df,  family = binomial, 
       glmerControl(optimizer="bobyqa"))

根据 https://sebastiansauer.github.io/convert_logit2prob/，手动计算

(exp(b)/(1+(exp(b))

产生与以下函数相同的预测概率：

predict(Model1, type="response")
mean(predict)

我用多种类型的练习数据进行了尝试，这通常是有效的（在上面的示例中，它是 0.23199）。然而，当我使用实际数据时，我从预测函数 (0.59) 得到的值与手动得到的值 (0.57) 略有不同。我知道这不是很多，但当我使用任何其他数据时，不会出现这种差异。

head(Df_real)
      source    Response.number  Item.number  Ps.number
           0               1         1         1
           0               2         1         1
           1               3         1         1
           1               4         1         1
           0               5         1         1
           0               6         1         1
           0               1         2         1
           0               2         2         1
           1               3         2         1
           1               4         2         1
           0               5         2         1
           0               6         2         1
           0               1         1         2
           0               2         1         2
           1               3         1         2
           1               4         1         2
           0               5         1         2
           0               6         1         2
           0               1         2         2
           0               2         2         2
           1               3         2         2
           1               4         2         2
           0               5         2         2

等等

数据是嵌套的，也就是说，每个响应值的参与者数量大致相同，每个项目值的响应数量相同，等等。这可能是差异的根源吗？如果有，该如何处理？

predict()

功能合适吗？

Answer 1

当您在

predict

中运行

glmer

时，它会使用原始数据中存在的变量（包括随机效应）来估计概率，因此您不能简单地对固定效应系数执行

exp(b)/(1 + exp(b))

。

为了看到这一点，让我们尝试将随机效应变量的一些数据帧传递给

newdata

的

predict

参数：

predict(Model1, newdata = data.frame(Item.number = 1, 
                                     Response.number = c(1, 2), 
                                     Ps.number = 1), type = 'response')  
#>         1         2 
#> 0.2261900 0.2405297

由于模型中没有任何固定效应，因此总体概率（考虑随机效应）将是：

b <- fixef(Model1)
exp(b)/(1 + exp(b))
#> (Intercept) 
#>   0.2319048

所以这实际上取决于您想要预测的内容，即您是否希望考虑随机变量。如果你这样做，你可以使用

predict

，否则你可以根据固定效果手动计算。

作为旁注，基本 R 函数

plogis

可能是将对数赔率转换为概率的最简单方法，并且它在这里显然有效 - 我们可以看到使用

type = "response"

相当于

plogis(predict(Model1, type = "link"))

all(
  plogis(predict(Model1, type = "link")) == predict(Model1, type = "response")
)
#> [1] TRUE

手动计算是可以的，尽管你会得到非常小的浮点差异：

b <- predict(Model1, type = "link")

hist(exp(b)/(1 + exp(b)) - predict(Model1, type = 'response'))

因此，手动计算模型总体概率的明智方法是

plogis(fixef(Model1))
#> (Intercept) 
#>   0.2319048

predict() 函数产生的值与 glmer 中的手动计算不同

问题描述投票：0回答：1

1个回答

最新问题

predict() 函数产生的值与 glmer 中的手动计算不同

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1