如何使用 dplyr 按组汇总和总计缺失计数?

问题描述 投票:0回答:1

我的问题与上一个问题相切:如何按组进行汇总并使用 R 中的 dplyr 获取整个数据集的摘要

使用回复中的表示来设置新问题:

library(tidyverse)
set.seed(500)
dat <- tibble(
  treatment = sample(c("Group1", "Group2", "Group3"), 100, replace = TRUE),
  recruitment_strategy = sample(c("Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5"), 100, replace = TRUE),
  Variable_A = rnorm(100),
  Variable_B = rnorm(100),
  Variable_C = rnorm(100)
)

现在让数据中不会出现一种治疗 x 策略组合

dat2 <- dat %>% 
    filter(!(recruitment_strategy == "Strategy 1" & treatment == "Group1"))

并运行之前的解决方案:

dat2 %>%
    inner_join(
        x = count(., treatment, recruitment_strategy) %>% spread(treatment, n),
        y = count(., recruitment_strategy, name = "Overall_dataset"),
        by = "recruitment_strategy"
    ) %>%
    mutate_at(
        .vars = vars(-recruitment_strategy),
        .funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
    )

未出现的组显示 NA 计数:

# A tibble: 5 × 5
  recruitment_strategy Group1  Group2   Group3  Overall_dataset
  <chr>                <glue>  <glue>   <glue>  <glue>         
1 Strategy 1           NA (NA) 13 (30%) 4 (16%) 17 (17%)       
2 Strategy 2           8 (NA)  6 (14%)  6 (24%) 20 (20%)       
3 Strategy 3           6 (NA)  12 (27%) 3 (12%) 21 (21%)       
4 Strategy 4           9 (NA)  4 (9%)   5 (20%) 18 (18%)       
5 Strategy 5           6 (NA)  9 (20%)  7 (28%) 22 (22%)       

我的问题是如何让它显示 0 计数而不是 NA? 我尝试用

count()
修改
.drop = FALSE
参数,但没有什么区别

x = count(., treatment, recruitment_strategy, .drop = FALSE) %>% spread(treatment, n),

还有其他想法吗? 谢谢

r dplyr count
1个回答
0
投票

好吧,我找到了自己的解决方案。诀窍是在计数之前确保

recruitment_strategy
是因子变量,同时设置
.drop = FALSE
。另外,我意识到
spread
有点过时了,所以我将其更改为使用
pivot_wider
代替:

dat2 %>%
    mutate( recruitment_strategy = as.factor( recruitment_strategy)) %>% 
    inner_join(
        x = count(., treatment, recruitment_strategy, .drop = FALSE) %>% 
            pivot_wider(id_cols = recruitment_strategy,
                        names_from = treatment, 
                        values_from = n),
        y = count(., recruitment_strategy, name = "Overall_dataset"),
        by = "recruitment_strategy"
    ) %>%
    mutate_at(
        .vars = vars(-recruitment_strategy),
        .funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
    )

# A tibble: 5 × 5
  recruitment_strategy Group1  Group2   Group3  Overall_dataset
  <fct>                <glue>  <glue>   <glue>  <glue>         
1 Strategy 1           0 (0%)  13 (30%) 4 (16%) 17 (17%)       
2 Strategy 2           8 (28%) 6 (14%)  6 (24%) 20 (20%)       
3 Strategy 3           6 (21%) 12 (27%) 3 (12%) 21 (21%)       
4 Strategy 4           9 (31%) 4 (9%)   5 (20%) 18 (18%)       
5 Strategy 5           6 (21%) 9 (20%)  7 (28%) 22 (22%)   
© www.soinside.com 2019 - 2024. All rights reserved.