我的问题与上一个问题相切:如何按组进行汇总并使用 R 中的 dplyr 获取整个数据集的摘要
使用回复中的表示来设置新问题:
library(tidyverse)
set.seed(500)
dat <- tibble(
treatment = sample(c("Group1", "Group2", "Group3"), 100, replace = TRUE),
recruitment_strategy = sample(c("Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5"), 100, replace = TRUE),
Variable_A = rnorm(100),
Variable_B = rnorm(100),
Variable_C = rnorm(100)
)
现在让数据中不会出现一种治疗 x 策略组合
dat2 <- dat %>%
filter(!(recruitment_strategy == "Strategy 1" & treatment == "Group1"))
并运行之前的解决方案:
dat2 %>%
inner_join(
x = count(., treatment, recruitment_strategy) %>% spread(treatment, n),
y = count(., recruitment_strategy, name = "Overall_dataset"),
by = "recruitment_strategy"
) %>%
mutate_at(
.vars = vars(-recruitment_strategy),
.funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
)
未出现的组显示 NA 计数:
# A tibble: 5 × 5
recruitment_strategy Group1 Group2 Group3 Overall_dataset
<chr> <glue> <glue> <glue> <glue>
1 Strategy 1 NA (NA) 13 (30%) 4 (16%) 17 (17%)
2 Strategy 2 8 (NA) 6 (14%) 6 (24%) 20 (20%)
3 Strategy 3 6 (NA) 12 (27%) 3 (12%) 21 (21%)
4 Strategy 4 9 (NA) 4 (9%) 5 (20%) 18 (18%)
5 Strategy 5 6 (NA) 9 (20%) 7 (28%) 22 (22%)
我的问题是如何让它显示 0 计数而不是 NA? 我尝试用
count()
修改 .drop = FALSE
参数,但没有什么区别
x = count(., treatment, recruitment_strategy, .drop = FALSE) %>% spread(treatment, n),
还有其他想法吗? 谢谢
好吧,我找到了自己的解决方案。诀窍是在计数之前确保
recruitment_strategy
是因子变量,同时设置 .drop = FALSE
。另外,我意识到 spread
有点过时了,所以我将其更改为使用 pivot_wider
代替:
dat2 %>%
mutate( recruitment_strategy = as.factor( recruitment_strategy)) %>%
inner_join(
x = count(., treatment, recruitment_strategy, .drop = FALSE) %>%
pivot_wider(id_cols = recruitment_strategy,
names_from = treatment,
values_from = n),
y = count(., recruitment_strategy, name = "Overall_dataset"),
by = "recruitment_strategy"
) %>%
mutate_at(
.vars = vars(-recruitment_strategy),
.funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
)
# A tibble: 5 × 5
recruitment_strategy Group1 Group2 Group3 Overall_dataset
<fct> <glue> <glue> <glue> <glue>
1 Strategy 1 0 (0%) 13 (30%) 4 (16%) 17 (17%)
2 Strategy 2 8 (28%) 6 (14%) 6 (24%) 20 (20%)
3 Strategy 3 6 (21%) 12 (27%) 3 (12%) 21 (21%)
4 Strategy 4 9 (31%) 4 (9%) 5 (20%) 18 (18%)
5 Strategy 5 6 (21%) 9 (20%) 7 (28%) 22 (22%)