My
df
如下所示: x
和 y
变量具有 0、1 和 NA 值,而 z
变量是一个数值范围从 0 到 5 的数值。我想测量 conditional 1
和
x
中那些
y
实例的平均值,以及
z
值的正常平均值,以及它们各自的置信区间。
df <- tribble(
~"name", ~"region", ~x, ~y, ~z,
"A", "reg1", 0, 1, 1,
"A", "reg1", 1, 1, NA,
"B", "reg1", 1, 0, 4,
"C", "reg2", 1, 0, 2,
"B", "reg2", 0, NA, 0,
"C", "reg1", NA, 0, 5,
"C", "reg1", 0, 1, 2,
"B", "reg1", NA, 1, 3,
"B", "reg2", 1, NA, NA,
"A", "reg2", 1, 1, 1,
"A", "reg2", 0, 1, 4,
"A", "reg2", 1, 1, 2,
"A", "reg1", 0, 1, 3,
)
我想要一个像这样的列的最终整洁表(只是为了说明我放了两行):
df1 <- tribble(
~"name", ~"region", ~"Indicator", ~"mean/prevalence", ~"Upper interval", ~"Lower interval",
"A", "reg1", "x", 66, 68.5, 62.3,
"A", "reg1", "z", 2.3, 2.5, 2.1,
)
我的问题是如何组织我的
dplyr
动词。我是这样做的,但这是错误的,因为在每个间隔计算中考虑的人口数量n()
(它们的长度都相同)。
df %>%
group_by(name, region) %>%
summarise(
meanX = mean(x == 1, na.rm = TRUE)*100,
nX = n(),
Xlower_ci = mean(x == 1)*100 - qt(1- 0.05/2, (n() - 1))*sd(x == 1)/sqrt(n()),
Xupper_ci = mean(x == 1)*100 + qt(1- 0.05/2, (n() - 1))*sd(x == 1)/sqrt(n()),
meanY = mean(y == 1, na.rm = TRUE)*100,
nY = n(),
Ylower_ci = mean(y == 1)*100 - qt(1- 0.05/2, (n() - 1))*sd(y == 1)/sqrt(n()),
Yupper_ci = mean(y == 1)*100 + qt(1- 0.05/2, (n() - 1))*sd(y == 1)/sqrt(n()),
meanZ = mean(z, na.rm = TRUE),
nZ = n(),
Zlower_ci = mean(z) - qt(1- 0.05/2, (n() - 1))*sd(z)/sqrt(n()),
Zupper_ci = mean(z) + qt(1- 0.05/2, (n() - 1))*sd(z)/sqrt(n()),
)
如果我能制作出上面的桌子,那么我就可以用
pivot_longer()
达到df1
,这是最终的结果。