为什么GLM预测的标准偏差等于0?

问题描述 投票:0回答:1

这是我的数据集:

library(NHANES)
library(plyr)

# Generate example dataset 
myvars <- c("ID","Gender", "Age", "Diabetes","BPSysAve", "BPDiaAve")
nhanes <- as.data.frame(NHANESraw[myvars])
nhanes$age.range <- cut(nhanes$Age, breaks = c(-Inf,45, 65, Inf), labels = c("<45Yrs", "45-65Yrs", ">65Yrs"))

让我们以SBP作为因变量来拟合一个简单的GLM:

# Fitting the GLM model
fit <- glm(BPSysAve ~ Diabetes + age.range, data = nhanes)

产生预测的SBP:

# Producing df with predictions
pred <- data.frame(diabetes = fit$model$Diabetes, 
                  sbp = predict.glm(fit, type = "response"),
                  age.range = fit$model$age.range)

按层汇总SBP预测:

# Summarize prediction
ddply(pred, c("diabetes", "age.range"), summarize,
              N = sum(!is.na(sbp)),
              sbp.mean = mean(sbp),
              sd = sd(sbp)
              )

浏览结果:

 diabetes    age.range     N    sbp.mean    sd
1       No    <45Yrs      8616   110.6067    0
2       No  45-65Yrs      2942   124.9528    0
3       No    >65Yrs      1701   133.2779    0
4      Yes    <45Yrs       214   113.6860    0
5      Yes  45-65Yrs       742   128.0321    0
6      Yes    >65Yrs       645   136.3572    0

我想了解:

  1. 我的方法对于在上述层次中实现合适的SBP和SD是否正确?
  2. 为什么SD等于0?
r regression prediction glm standard-deviation
1个回答
0
投票

原因是'sbp'中只有一个唯一值,按'糖尿病'和'age.range'分组]

library(dplyr)
pred %>% 
   group_by(diabetes, age.range) %>% 
   summarise(n = n_distinct(sbp))
# A tibble: 6 x 3
# Groups:   diabetes [2]
#  diabetes age.range     n
#  <fct>    <fct>     <int>
#1 No       <45Yrs        1
#2 No       45-65Yrs      1
#3 No       >65Yrs        1
#4 Yes      <45Yrs        1
#5 Yes      45-65Yrs      1
#6 Yes      >65Yrs        1


sd(c(15, 15, 15))
#[1] 0
© www.soinside.com 2019 - 2024. All rights reserved.