使用plot()与ggplot2的置信区间差异

问题描述 投票:0回答:1

我正在尝试使用 R 中的plot() 和 ggplot2 绘制回归线的置信区间 (CI)。但是,我注意到这两种方法之间的 CI 明显不同。我希望能帮助您理解为什么存在这种差异。

下面是我一直在使用的 R 代码:

# Loading libraries
library(ggplot2)
library(MethComp)

# Hypothethical data set for stackoverflow
set.seed(123) # Set seed for reproducibility
data_for_plot <- data.frame(rater1 = rnorm(100, mean = 50, sd = 10))
data_for_plot$rater2 <- data_for_plot$rater1 + runif(100, min = -20, max = 20)

# Prepare for x and y axis coordinates
common_min <- min(c(data_for_plot$rater1, data_for_plot$rater2))
common_min <- floor(common_min / 5) * 5
common_max <- max(c(data_for_plot$rater1, data_for_plot$rater2))
common_max <- ceiling(common_max / 5) * 5
common_breaks <- seq(from = common_min, to = common_max, length.out = 5)

# Passing-Bablok regression
pb_result <- PBreg(data_for_plot$rater1, data_for_plot$rater2)
pb_intercept <- pb_result$coefficients["Intercept", "Estimate"]
pb_intercept_lci <- pb_result$coefficients["Intercept", "2.5%CI"]
pb_intercept_uci <- pb_result$coefficients["Intercept", "97.5%CI"]
pb_slope <- pb_result$coefficients["Slope", "Estimate"]
pb_slope_lci <- pb_result$coefficients["Slope", "2.5%CI"]
pb_slope_uci <- pb_result$coefficients["Slope", "97.5%CI"]

# Manually creating CI ribbon (Problematic segment)
x_vals <- seq(from = common_min, to = common_max, length.out = 100)
upper_bound <- pb_intercept_uci + pb_slope_uci * x_vals
upper_bound <- pmin(upper_bound, common_max)
lower_bound <- pb_intercept_lci + pb_slope_lci * x_vals
lower_bound <- pmax(lower_bound, common_min)
ribbon_data <- data.frame(
  x = x_vals,
  ymin = lower_bound,
  ymax = upper_bound
)

# Generating plots
gg_plot_result <- ggplot(
  data = data_for_plot, aes(x = rater1, y = rater2)) +
  geom_ribbon(data = ribbon_data, aes(x = x, ymin = ymin, ymax = ymax), 
              fill = "#253494", alpha = 0.2, inherit.aes = FALSE)+
  geom_abline(intercept = 0, slope = 1, 
              color = "orange", size = 0.25, linetype="dashed") +
  geom_abline(intercept = pb_intercept, slope = pb_slope, 
              color = "#253494", linewidth = 1) +  # Plotting the regression line
  geom_point(shape = 21, colour = "black", fill = "white", size = 2.5) +
  labs(title = "Measurement (cm)", x = "Rater 1", y = "Rater 2") +           
  theme_minimal() +                                                      
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5),
    plot.title = element_text(hjust = 0.5),
    aspect.ratio = 1
  ) +
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(common_min, common_max), breaks = common_breaks) +
  scale_y_continuous(limits = c(common_min, common_max), breaks = common_breaks)

# Generating PB plot directly
pb_plot_result <- plot(
  pb_result, asp = 1,
  xlim = c(common_min, common_max), 
  ylim = c(common_min, common_max),
  main = "Measurement (cm)",
  xlab = "Rater 1",
  ylab = "Rater 2"
)

print(gg_plot_result)

使用plot()时,我观察到一个图(我们称之为图1)显示了相对较小的CI。

Figure 1

另一方面,使用ggplot2,通过手动计算并添加CI功能区,得到的CI显得更大。 (图2)

Figure 2

任何人都可以阐明可能导致这种差异的原因吗?具体来说,还可以使用哪些其他方法来绘制具有计算出的截距和斜率 CI 的置信区间?

任何见解将不胜感激。谢谢你。

r ggplot2 regression visualization confidence-interval
1个回答
0
投票

您错误地计算了预测的置信区间。你有

upper_bound <- pb_intercept_uci + pb_slope_uci * x_vals

以及下界的类似表达式。对于普通线性回归,置信区间类似于

sqrt(var_int + 2*cov_int_slope*x + var_slope*x^2)

但是查看

MethComp:::predict.PBreg
的代码表明正在计算一些非常不同的东西(尽管对于这种分析来说可能是正确的)。这个计算可能在
?PBreg
中给出的原始 Passing 和 Bablok (1983) 参考文献中进行了描述:我还没有检查过。

© www.soinside.com 2019 - 2024. All rights reserved.