我正在尝试按程序名称进行分组并计算每个程序的总列量。混合中存在 NA。
以下代码仅计算所有非 NA 观测值。它实际上并没有将数字相加。我需要在这里做一些 ifelse 吗?我还想知道 !is.na 是否会导致它计算所有非 NA 观测值,但是,如果我删除它,我会得到所有 NA 作为我的总数。
df %>%
group_by(ProgramName) %>%
summarise(ED = sum(!is.na(HighSchool)), EMP = sum(!is.na(Employment)))
或者,是否有一种方法可以按程序名称进行分组,并且仅当观察值在任一列中为 1 时才对观察值进行计数,而不是计算总数?无论如何,这更接近我想要的。任何支持将不胜感激。
回答这两个问题:
library(dplyr)
df <- structure(list(ProgramName = c("Program A", "Program A", "Program A",
"Program A", "Program B", "Program B", "Program B", "Program B",
"Program C", "Program C", "Program C", "Program C"), HighSchool = c(1L,
0L, 0L, 1L, 1L, 0L, 1L, NA, 1L, 1L, NA, 1L), Employment = c(0L,
0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, NA, 1L)), class = "data.frame", row.names = c(NA,
-12L))
df %>%
group_by(ProgramName) %>%
summarise(across(HighSchool:Employment, ~ if(all(is.na(.))) NA else sum(., na.rm = TRUE)))
# A tibble: 3 × 3
ProgramName HighSchool Employment
<chr> <int> <int>
1 Program A 2 1
2 Program B 2 3
3 Program C 3 2
# This is the one you state that you actually want
df %>%
group_by(ProgramName) %>%
summarise(ED = sum(HighSchool == 1, na.rm = TRUE),
EMP = sum(Employment == 1, na.rm = TRUE))
A tibble: 3 × 3
ProgramName ED EMP
<chr> <int> <int>
1 Program A 2 1
2 Program B 2 3
3 Program C 3 2