我一直在尝试使用
gtsummary
包为 R markdown 报告创建漂亮的双向表,并分析一项调查。我一直在尝试查看两个调查问题之间的关系,用户可以为其中一个问题选择多个答案。
我以长格式保存了调查答案,其中多个答案只会有一个额外的行,其中包含一个 user_id 的答案组合。
但是,我需要根据唯一用户数量计算出小计和百分比来选择每个答案。我不知道如何防止重复计算。有没有办法使用 tbl_summary 来做到这一点?
我创建了一些示例 R 代码来解释我需要什么。
Test <- data.frame(user_id = c("1","1","2","3","3","4","4","4","5", "6"),
Question1 = c("A","A","B","C","C","A","A","A","A","B"),
Question2 = c("Side-effect 1", "Side-effect 2", "Side-effect 3", "Side-effect 1", "Side-effect 3", "Side-effect 1", "Side-effect 2", "Side-effect 3", "Side-effect 3", "Side-effect 1"))
Test %>%
gtsummary::tbl_summary(by = Question1, include = c(Question1, Question2))
输出这个:
Characteristic A, N = 6 B, N = 2 C, N = 2
Question2
Side-effect 1 2 (33%) 1 (50%) 1 (50%)
Side-effect 2 2 (33%) 0 (0%) 0 (0%)
Side-effect 3 2 (33%) 1 (50%) 1 (50%)
1 n (%)
但是,
但是,按照我需要的方式,“A”的小计应该是 3 而不是 6,因为有 3 个 user_id 选择了 A,而 C 的小计应该是 1,因为只有一个 user_id 选择了 C。百分比应该更改为也反映这一点。如果 A 列加起来超过 100% 也没关系,因为用户可以选择多个答案。所以我希望输出看起来像这样:
Characteristic A, N = 3 B, N = 2 C, N = 2
Question2
Side-effect 1 2 (67%) 1 (50%) 1 (100%)
Side-effect 2 2 (67%) 0 (0%) 0 (0%)
Side-effect 3 2 (67%) 1 (50%) 1 (100%)
1 n (%)
有没有办法计算不同的 user_id 以获得类似于
Test %>% group_by(Question1) %>% summarize(sub_total = n_disinct(user_id))
的 sub_totals 或者是否有手动覆盖 sub_totals ?我还想添加适用的测试,但现在可以等待。
我不知道如何使用
gtsummary
来实现这一点,但你可以尝试以困难的方式复制,例如
library(dplyr)
library(tidyr)
Test %>%
mutate(
.by = Question1,
users = n_distinct(user_id),
Question1 = paste0(Question1," (N = ",users,")") ) %>%
summarise(
.by = c(Question1,Question2),
n = paste0(n()," (",round(100*n()/users),"%)")
) %>%
unique() %>%
pivot_wider(names_from = Question1,values_from = n,values_fill = "0 (0%)")