我在 R 工作。
我有一些关于学校教职员工的数据:
data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8),
disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"),
teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))
我编写了一个函数,可以对插入的变量进行求和。 “group_tag”参数是为了帮助以后在我的代码中进行调试。
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by({{variable}}) %>%
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
然后我使用该函数依次按残障状态、年龄组和教师进行分组:
disability_grouped <- group_the_data(data = data,
variable = disability_status,
group_tag = "disability status")
age_group_grouped <- group_the_data(data = data,
variable = age_group,
group_tag = "age group")
role_grouped <- group_the_data(data = data,
variable = teacher,
group_tag = "role")
一旦获得了所需的数据框,我就把它们绑定在一起:
all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)
有没有办法循环访问变量,这样我就不需要将函数写三次?
或者使用Apply 函数之一是更好的主意吗?
您可以使用
lapply
或 purrr::map
来迭代变量。为此,我们需要循环遍历字符串而不是变量,因此您需要 pick
中的变量 group_by
。
library(tidyverse)
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by(pick(variable)) %>% # pick the variable
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
purrr::map(colnames(data)[-1], ~ group_the_data(data, variable = .x, group_tag = .x)) %>%
bind_rows()
# A tibble: 6 × 5
disability_status number_staff grouping_tag age_group teacher
<chr> <dbl> <chr> <chr> <chr>
1 no 4 disability_status NA NA
2 yes 4 disability_status NA NA
3 NA 4 age_group 20-30 NA
4 NA 4 age_group 30-40 NA
5 NA 4 teacher NA no
6 NA 4 teacher NA yes
同样,如果你想有不同的“变量”和“group_tag”,请使用
purrr::map2
:
purrr::map2(colnames(data)[-1],
c("disability status", "age group", "role"),
~ group_the_data(data, variable = .x, group_tag = .y)) %>%
bind_rows()
# A tibble: 6 × 5
disability_status number_staff grouping_tag age_group teacher
<chr> <dbl> <chr> <chr> <chr>
1 no 4 disability status NA NA
2 yes 4 disability status NA NA
3 NA 4 age group 20-30 NA
4 NA 4 age group 30-40 NA
5 NA 4 role NA no
6 NA 4 role NA yes