我需要按县获取多种药物和多个县的阳性率。真实的数据包括更多的药品和更多的县。 NA 值表示未订购测试,因此不应将其计入样本总数。
另外,我需要使用 prop.test(x, y, conf.level = 0.95) 来获取每个县的比例和置信区间。
这是模拟数据:
county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data <- data.frame(county, drug1, drug2)
问题是我无法使用简单的groupby和summary来计算比率,如下所示。我必须使用 prop.test() 并获取每个县的比率和置信区间。
data |>
group_by(county) |>
summarise(drug1 = sum(drug1=="Positive", na.rm = TRUE)/
sum(!is.na(drug1)),
drug2 = sum(drug2=="Positive", na.rm = TRUE)/
sum(!is.na(drug2))
) |>
ungroup()
county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange",
"Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA,
"Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative",
"Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data1 <- data.frame(county, drug1, drug2)
最好将数据保留为完全“长”格式。您可以使用 f.ex 对其进行转换。
melt
来自 reshape2
。
library(reshape2)
data1 <- melt(data1, id="county", variable.name="drug")
然后您通过指定因素或使用公式(如图所示)
split
进行分组。 lapply
覆盖各组,将值列的 table
发送到 prop.test
。
prop.res <- lapply(split(data1, ~ county+drug), function(x) prop.test(table(x$value)))
结果是一个
htest
对象列表,您可以再次使用 lapply
从中提取值。一次一个……
lapply(prop.res, `[[`, "p.value")
# $Erie.drug1
# Negative
# 0.375
#
# $Orange.drug1
# Negative
# 0.21875
#
# $Erie.drug2
# Negative
# 0.6875
#
# $Orange.drug2
# [1] 1
#
或多个。
lapply(prop.res, `[`, c("p.value", "conf.int"))
# $Erie.drug1
# $Erie.drug1$p.value
# Negative
# 0.375
#
# $Erie.drug1$conf.int
# [1] 0.0050507634 0.7164179361
# attr(,"conf.level")
# [1] 0.95
#
#
# $Orange.drug1
# $Orange.drug1$p.value
# Negative
# 0.21875
#
# $Orange.drug1$conf.int
# [1] 0.35876542 0.99578926
# attr(,"conf.level")
# [1] 0.95
#
#
# $Erie.drug2
# $Erie.drug2$p.value
# Negative
# 0.6875
# …