将多列从函数的参数传递给group_by

问题描述 投票:0回答:2

考虑以下示例:

library(tidyverse)

df <- tibble(
  cat = rep(1:2, times = 4, each = 2),
  loc = rep(c("a", "b"), each = 8),
  value = rnorm(16)
)

df %>% 
  group_by(cat, loc) %>% 
  summarise(mean = mean(value), .groups = "drop")

# # A tibble: 4 x 3
# cat loc     mean
# * <int> <chr>  <dbl>
# 1     1 a     -0.563
# 2     1 b     -0.394
# 3     2 a      0.159
# 4     2 b      0.212

我想为最后两行创建一个函数,它采用

group
参数将多列传递给
group_by

这里有一个虚拟函数,它通过一组列计算

mean
值作为示例:

group_mean <- function(data, col_value, group) {
  data %>% 
    group_by(across(all_of(group))) %>% 
    summarise(mean = mean({{col_value}}), .groups = "drop")
}

group_mean(df, value, c("cat", "loc"))

# # A tibble: 4 x 3
# cat loc     mean
# * <int> <chr>  <dbl>
# 1     1 a     -0.563
# 2     1 b     -0.394
# 3     2 a      0.159
# 4     2 b      0.212

该函数有效,但我更喜欢使用

tidyselect
/
rlang
方法来避免引用列名称,如下所示:

group_mean(df, value, c(cat, loc))

# Error: Problem adding computed columns in `group_by()`.
# x Problem with `mutate()` input `..1`.
# x object 'loc' not found
# ℹ Input `..1` is `across(all_of(c(cat, loc)))`.

group
括在
{{}}
中适用于单列,但不适用于多列。我怎样才能做到这一点?

r tidyverse
2个回答
5
投票

考虑使用

...
,然后在使用
sym
 转换为 
ensym

bol 后,我们可以选择使用带引号或不带引号的
group_mean <- function(data, col_value, ...) {
   data %>% 
     group_by(!!! ensyms(...)) %>% 
     summarise(mean = mean({{col_value}}), .groups = "drop")
 }

-测试

> group_mean(df, value, cat, loc)
# A tibble: 4 x 3
    cat loc     mean
  <int> <chr>  <dbl>
1     1 a      0.327
2     1 b     -0.291
3     2 a     -0.382
4     2 b     -0.320
> group_mean(df, value, 'cat', 'loc')
# A tibble: 4 x 3
    cat loc     mean
  <int> <chr>  <dbl>
1     1 a      0.327
2     1 b     -0.291
3     2 a     -0.382
4     2 b     -0.320

如果我们已经使用

...
作为其他参数,那么一个选项是

group_mean <- function(data, col_value, group) {
  grp_lst <- as.list(substitute(group))
  if(length(grp_lst)> 1) grp_lst <- grp_lst[-1]
  grps <- purrr::map_chr(grp_lst, rlang::as_string)
  data %>% 
     group_by(across(all_of(grps))) %>% 
     summarise(mean = mean({{col_value}}), .groups = "drop")
}

-测试

> group_mean(df, value, c(cat, loc))
# A tibble: 4 x 3
    cat loc     mean
  <int> <chr>  <dbl>
1     1 a      0.327
2     1 b     -0.291
3     2 a     -0.382
4     2 b     -0.320

0
投票

现在使用 across() 更快更容易。 以你为例:

group_mean <- function(data, col_value, group) {
  data %>% 
    group_by(across({{group}})) %>% 
    summarise(mean = mean({{col_value}}), .groups = "drop")
}

group_mean(df, value, group=c(cat, loc))
© www.soinside.com 2019 - 2024. All rights reserved.