允许在计算小计和百分比时启用 n_distinct() 来处理分类变量之间的一对多关系

问题描述 投票:0回答:1

我一直在尝试使用

gtsummary
包为 R markdown 报告创建漂亮的双向表,并分析一项调查。我一直在尝试查看两个调查问题之间的关系,用户可以为其中一个问题选择多个答案。

我以长格式保存了调查答案,其中多个答案只会有一个额外的行,其中包含一个 user_id 的答案组合。

但是,我需要根据唯一用户数量计算出小计和百分比来选择每个答案。我不知道如何防止重复计算。有没有办法使用 tbl_summary 来做到这一点?

我创建了一些示例 R 代码来解释我需要什么。

Test <- data.frame(user_id = c("1","1","2","3","3","4","4","4","5", "6"),
           Question1 = c("A","A","B","C","C","A","A","A","A","B"),
           Question2 = c("Side-effect 1", "Side-effect 2", "Side-effect 3", "Side-effect 1", "Side-effect 3", "Side-effect 1", "Side-effect 2", "Side-effect 3", "Side-effect 3", "Side-effect 1"))
           

Test %>% 
  gtsummary::tbl_summary(by = Question1, include = c(Question1, Question2))

输出这个:

Characteristic  A, N = 6    B, N = 2    C, N = 2
Question2           
    Side-effect 1   2 (33%) 1 (50%) 1 (50%)
    Side-effect 2   2 (33%) 0 (0%)  0 (0%)
    Side-effect 3   2 (33%) 1 (50%) 1 (50%)
1 n (%)

但是,

但是,按照我需要的方式,“A”的小计应该是 3 而不是 6,因为有 3 个 user_id 选择了 A,而 C 的小计应该是 1,因为只有一个 user_id 选择了 C。百分比应该更改为也反映这一点。如果 A 列加起来超过 100% 也没关系,因为用户可以选择多个答案。所以我希望输出看起来像这样:

Characteristic  A, N = 3    B, N = 2    C, N = 2
Question2           
    Side-effect 1   2 (67%) 1 (50%) 1 (100%)
    Side-effect 2   2 (67%) 0 (0%)  0 (0%)
    Side-effect 3   2 (67%) 1 (50%) 1 (100%)
1 n (%)

有没有办法计算不同的 user_id 以获得类似于

Test %>% group_by(Question1) %>% summarize(sub_total = n_disinct(user_id))
的 sub_totals 或者是否有手动覆盖 sub_totals ?我还想添加适用的测试,但现在可以等待。

r gtsummary
1个回答
0
投票

我不知道如何使用

gtsummary
来实现这一点,但你可以尝试以困难的方式复制,例如

library(dplyr)
library(tidyr)
Test %>% 
  mutate(
    .by = Question1,
    users = n_distinct(user_id),
    Question1 = paste0(Question1," (N = ",users,")") ) %>% 
  summarise(
    .by = c(Question1,Question2),
    n = paste0(n()," (",round(100*n()/users),"%)")
  ) %>%
  unique() %>% 
  pivot_wider(names_from = Question1,values_from = n,values_fill = "0 (0%)") 

© www.soinside.com 2019 - 2024. All rights reserved.