请,我想计算两个变量在另一个分类变量上的置信区间的平均差异。
我有兴趣计算p1,p2和pdiff的置信区间
非常感谢
library(tidyverse)
iris %>%
mutate(out1 = Sepal.Length < 6,
out2 = Sepal.Length < 5) %>%
group_by(Species) %>%
summarise(p1 = mean(out1),
p2 = mean(out2),
pdiff = p1 - p2)
# A tibble: 3 x 4
Species p1 p2 pdiff
<fct> <dbl> <dbl> <dbl>
1 setosa 1 0.4 0.6
2 versicolor 0.52 0.02 0.5
3 virginica 0.14 0.02 0.12
获得置信区间的一种方法是通过prop.test
。您可以为每个指标(p1
,p2
,diff
)运行此测试,然后使用map
提取所需的信息。
library(tidyverse)
iris %>%
mutate(out1 = Sepal.Length < 6,
out2 = Sepal.Length < 5) %>%
group_by(Species) %>%
summarise(p1 = mean(out1),
p2 = mean(out2),
pdiff = p1 - p2,
p1_test = list(prop.test(sum(out1), length(out1))), # create tests for p1, p2 and diff and save the outputs as list
p2_test = list(prop.test(sum(out2), length(out2))),
pdiff_test = list(prop.test(c(sum(out1),sum(out2)), c(length(out1),length(out2)))),
p1_low = map_dbl(p1_test, ~.$conf.int[1]), # extract low and high confidence intervals based on the corresponding test
p1_high = map_dbl(p1_test, ~.$conf.int[2]),
p2_low = map_dbl(p2_test, ~.$conf.int[1]),
p2_high = map_dbl(p2_test, ~.$conf.int[2]),
pdiff_low = map_dbl(pdiff_test, ~.$conf.int[1]),
pdiff_high = map_dbl(pdiff_test, ~.$conf.int[2])) %>%
select(-matches("test")) # remove test columns
# # A tibble: 3 x 10
# Species p1 p2 pdiff p1_low p1_high p2_low p2_high pdiff_low pdiff_high
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 1 0.4 0.6 0.911 1 0.267 0.548 0.444 0.756
# 2 versicolor 0.52 0.02 0.5 0.376 0.661 0.00104 0.120 0.336 0.664
# 3 virginica 0.14 0.02 0.12 0.0628 0.274 0.00104 0.120 -0.00371 0.244