分组依据和列的总和/计数问题

问题描述 投票:0回答:1

我正在尝试按程序名称进行分组并计算每个程序的总列量。混合中存在 NA。

采用这样的数据框: dataframe

并得到类似的回报returned data

以下代码仅计算所有非 NA 观测值。它实际上并没有将数字相加。我需要在这里做一些 ifelse 吗?我还想知道 !is.na 是否会导致它计算所有非 NA 观测值,但是,如果我删除它,我会得到所有 NA 作为我的总数。

df %>%  
  group_by(ProgramName) %>%
  summarise(ED = sum(!is.na(HighSchool)), EMP = sum(!is.na(Employment)))

或者,是否有一种方法可以按程序名称进行分组,并且仅当观察值在任一列中为 1 时才对观察值进行计数,而不是计算总数?无论如何,这更接近我想要的。任何支持将不胜感激。

r dplyr data-wrangling
1个回答
0
投票

回答这两个问题:

library(dplyr)

df <- structure(list(ProgramName = c("Program A", "Program A", "Program A", 
"Program A", "Program B", "Program B", "Program B", "Program B", 
"Program C", "Program C", "Program C", "Program C"), HighSchool = c(1L, 
0L, 0L, 1L, 1L, 0L, 1L, NA, 1L, 1L, NA, 1L), Employment = c(0L, 
0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, NA, 1L)), class = "data.frame", row.names = c(NA, 
-12L))


df %>%  
  group_by(ProgramName) %>%
  summarise(across(HighSchool:Employment, ~ if(all(is.na(.))) NA else sum(., na.rm = TRUE)))

# A tibble: 3 × 3
  ProgramName HighSchool Employment
  <chr>            <int>      <int>
1 Program A            2          1
2 Program B            2          3
3 Program C            3          2

# This is the one you state that you actually want
df %>%  
  group_by(ProgramName) %>%
  summarise(ED = sum(HighSchool == 1, na.rm = TRUE),
            EMP = sum(Employment == 1, na.rm = TRUE))

A tibble: 3 × 3
  ProgramName    ED   EMP
  <chr>       <int> <int>
1 Program A       2     1
2 Program B       2     3
3 Program C       3     2
© www.soinside.com 2019 - 2024. All rights reserved.