ggplot中正常拟合的置信区间,累积概率

问题描述 投票:1回答:1

编辑:对不起,我是这个社区的新手。我尝试通过示例数据和代码使其更加清晰。

这是数据(dput的输出):

structure(list(`Sample Name` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), Type = c("A", 
"A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B"), Size = c(1, 1, 1, 1, 
1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), Height = c(270, 
280, 290, 295, 292, 285, 305, 330, 125, 130, 140, 142, 123, 117, 
140, 135, 132, 145, 160, 170, 136, 154)), row.names = c(NA, -22L
), class = c("tbl_df", "tbl", "data.frame"))

现在我使用过滤器对数据进行分类。我不确定这是否是明智的选择,但目前为止仍然有效。首先,两个类别的大小分别为1和3,然后将每个大小分为两种类型:A和B。因此,最后有4种数据。

SizeOne <- filter (Alldata, Size== "1")
SizeThree <- filter (Alldata, Size== "3")

SizeonA <- filter (SizeOne, Type=="A")
SizeoneB <- filter (SizeOne, Type=="B")
SizeThreeA <- filter (SizeThree, Type=="A")
SizeThreeB <- filter (SizeThree, Type=="B")

现在这是用于绘制4种不同类别的累积概率的代码。然后,我使用stat_function将高斯分布拟合添加到每个累积图。

p2 = ggplot() + 
  stat_ecdf(data = SizeOne,aes(x= Height, color=SizeOne$Type),geom = "point", size = 1.2, linetype= "twodash", pad= FALSE)+  
  stat_ecdf(data = SizeThree,aes(x= Height, color=SizeThree$Type),geom = "point", size = 1 , pad= FALSE)+  
    scale_color_manual(values = c("#e73a00", "#002ee7"))+
    labs(title= "Cumulative probability", y = "Cumulative Probability", x= "Height") +



  stat_function(data= SizeThreeB, fun = pnorm, color="#e73a00" , args = list(mean=mean(SizeThreeB$Height), sd=sd(SizeThreeB$Height)))+
    stat_function(data= SizeThreeA, fun = pnorm, color="#002ee7" , args = list(mean=mean(SizeThreeA$Height), sd=sd(SizeThreeA$Height)))+
    stat_function(data= SizeoneB, fun = pnorm, color="#e73a00" , args = list(mean=mean(SizeoneB$Height), sd=sd(SizeoneB$Height)))+
      stat_function(data= SizeonA, fun = pnorm, color="#002ee7" , args = list(mean=mean(SizeonA$Height), sd=sd(SizeonA$Height)))



p2
  1. 现在,我的问题是如何将99%,95%和90%(带)的置信区间添加到高斯拟合中? (而不是经验累积)。
  2. 第二,如何将误差线添加到累积概率点? (到蓝色和蓝色的点)

到目前为止我的情节

<< img src =“ https://image.soinside.com/eyJ1cmwiOiAiaHR0cHM6Ly9pLnN0YWNrLmltZ3VyLmNvbS82ZlVxTi5wbmcifQ==” alt =“到目前为止我的情节”>

r ggplot2 confidence-interval cumulative-frequency
1个回答
0
投票
我认为在ggplot之外计算pnorm可能更安全。对于您的代码,您不需要过滤。保持数据不变,并使用ggplot的美观性和多面性对数据进行分组。

您可以使用相同的组来计算概率分布,因此可以在概率图中使用相同的美学/方面。

免责声明

我不是统计学家。我只是采用了您的平均值的置信上限和下限,并将其用作新的pnorm计算的平均值...这可能是不正确的。我同意用户@Stephane Laurent的观点,您可能会得到关于如何在CrossValidated上计算置信区间的更正确答案。

library(tidyverse) library(Rmisc) split_data <- alldata %>% split(., interaction(.$Size, .$Type)) df_pnorm <- map(split_data, function(x){ range_h <- range(x$Height) q <- seq(range_h[1], range_h[2]) CI <- CI(x$Height) sd <- sd(x$Height) p_mean <- pnorm(q = q, mean = CI["mean"], sd = sd) p_lower <- pnorm(q = q, mean = CI["lower"], sd = sd) p_upper <- pnorm(q = q, mean = CI["upper"], sd = sd) data.frame(Height = q, p_mean, p_lower, p_upper) }) %>% bind_rows(.id = "size.type") %>% separate(size.type, c("Size", "Type")) ggplot(alldata, aes(x = Height, color = Type, group = interaction(Type, Size))) + geom_ribbon(data = df_pnorm, aes(ymin = p_upper, ymax = p_lower), alpha = 0.3)+ geom_line(data = df_pnorm, aes(y = p_mean)) + stat_ecdf( geom = "point", pad= FALSE) + facet_grid(~ Size, scales = "free_x")

“”

reprex package(v0.3.0)在2020-04-24创建

数据

alldata <- structure(list(`Sample Name` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), Type = c("A", "A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"), Size = c(1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), Height = c(270, 280, 290, 295, 292, 285, 305, 330, 125, 130, 140, 142, 123, 117, 140, 135, 132, 145, 160, 170, 136, 154)), row.names = c(NA, -22L ), class = c("tbl_df", "tbl", "data.frame"))
© www.soinside.com 2019 - 2024. All rights reserved.