将参数传递给dplyr函数

Question

我想使用dplyr参数化以下计算，该Sepal.Length查找Sepal.Width的哪些值与library(dplyr) iris %>% group_by(Sepal.Length) %>% summarise(n.uniq=n_distinct(Sepal.Width)) %>% filter(n.uniq > 1)的多个值相关联：

not.uniq.per.group <- function(data, group.var, uniq.var) {
    iris %>%
        group_by(group.var) %>%
        summarise(n.uniq=n_distinct(uniq.var)) %>%
        filter(n.uniq > 1)
}

通常我会写这样的东西：

dplyr

然而，这种方法会引发错误，因为non-standard evaluation使用dplyr。应该如何编写这个函数？

Answer 1

您需要使用group_by_函数的标准评估版本（只需在函数名称后附加'_'，即.summarise_和interp()）并将字符串传递给您的函数，然后您需要将其转换为符号。要参数化summarise_的参数，您需要使用lazyeval包中定义的library(dplyr) library(lazyeval) not.uniq.per.group <- function(df, grp.var, uniq.var) { df %>% group_by_(grp.var) %>% summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>% filter(n_uniq > 1) } not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")。具体来说：

dplyr

请注意，在"soft deprecated"的最新版本中，dplyr函数的标准评估版本是Programming with dplyr vignette，支持非标准评估。

有关使用非标准评估的更多信息，请参阅pass bare expressions and use enquo to capture them as quosures。

Answer 2

与最高0.5的旧dplyr版本一样，新的dplyr具有标准评估（SE）和非标准评估（NSE）的功能。但他们的表达方式与之前不同。

如果你想要一个NSE功能，你library(tidyverse) library(rlang) f1 <- function(df, grp.var, uniq.var) { df %>% group_by(!!grp.var) %>% summarise(n_uniq = n_distinct(!!uniq.var)) %>% filter(n_uniq > 1) } a <- f1(iris, quo(Sepal.Length), quo(Sepal.Width)) b <- f1(iris, sym("Sepal.Length"), sym("Sepal.Width")) identical(a, b) #> [1] TRUE。如果您想要SE功能，只需直接传递quosures（或符号），然后在dplyr调用中取消引用它们。以下是该问题的SE解决方案：

sym()

请注意SE版本如何使您能够使用字符串参数 - 只需使用programming with dplyr将它们转换为符号。有关更多信息，请参阅dplyr插图。

Answer 3

在0.6.0的devel版本（即将发布的f1 <- function(df, grp.var, uniq.var) { grp.var <- enquo(grp.var) uniq.var <- enquo(uniq.var) df %>% group_by(!!grp.var) %>% summarise(n_uniq = n_distinct(!!uniq.var)) %>% filter(n_uniq >1) } res2 <- f1(iris, Sepal.Length, Sepal.Width) res1 <- not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width") identical(res1, res2) #[1] TRUE）中，我们也可以使用稍微不同的语法来传递变量。

enquo

这里quosure接受参数并将值作为dplyr返回（类似于基数R中的替换），通过懒惰地评估函数参数并在汇总内部，我们要求它取消引用（!!或UQ）以便对其进行评估。

Answer 4

在当前版本的group_by_（0.7.4）中，不推荐使用标准评估函数版本（对函数名称附加'_'，例如# definition of your function not.uniq.per.group <- function(data, group.var, uniq.var) { # enquotes variables to be used with dplyr-functions group.var <- enquo(group.var) uniq.var <- enquo(uniq.var) # use '!!' before parameter names in dplyr-functions data %>% group_by(!!group.var) %>% summarise(n.uniq=n_distinct(!!uniq.var)) %>% filter(n.uniq > 1) } # call of your function not.uniq.per.group(iris, Sepal.Length, Sepal.Width)）。相反，在编写函数时应该依赖tidyeval。

以下是您的函数的外观示例：

excellent vignette

如果你想了解所有关于细节的信息，那么dplyr团队就会如何运作find_dups = function(.table, ...) { require(dplyr) require(tidyr) # get column names of primary key pk <- .table %>% select(...) %>% names other <- names(.table)[!(names(.table) %in% pk)] # group by primary key, # get number of rows per unique combo, # filter for duplicates, # get number of distinct values in each column, # gather to get df of 1 row per primary key, other column, # filter for where a columns have more than 1 unique value, # order table by primary key .table %>% group_by(...) %>% mutate(cnt = n()) %>% filter(cnt > 1) %>% select(-cnt) %>% summarise_each(funs(n_distinct)) %>% gather_('column', 'unique_vals', other) %>% filter(unique_vals > 1) %>% arrange(...) %>% return # Final dataframe: ## One row per primary key and column that creates duplicates. ## Last column indicates how many unique values of ## the given column exist for each primary key. }。

Answer 5

我在过去编写了一个函数，它执行与您正在执行的操作类似的操作，除了它探索主键外的所有列并查找每个组的多个唯一值。

dat %>% find_dups(key1, key2)

此功能也适用于管道操作员：

lazyeval

Answer 6

您可以通过使用do调用匿名函数然后使用get来避免library(dplyr) not.uniq.per.group <- function(df, grp.var, uniq.var) { df %>% group_by_(grp.var) %>% do((function(., uniq.var) { with(., data.frame(n_uniq = n_distinct(get(uniq.var)))) } )(., uniq.var)) %>% filter(n_uniq > 1) } not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")。该解决方案可以更普遍地用于采用多个聚合。我通常单独写这个功能。

{{

Answer 7

以下是使用卷曲的library(dplyr) not.uniq.per.group <- function(data, group.var, uniq.var) { data %>% group_by({{group.var}}) %>% summarise(n.uniq=n_distinct({{uniq.var}})) %>% filter(n.uniq > 1) } iris %>% not.uniq.per.group(Sepal.Length, Sepal.Width) #> # A tibble: 25 x 2 #> Sepal.Length n.uniq #> <dbl> <int> #> 1 4.4 3 #> 2 4.6 4 #> 3 4.8 3 #> 4 4.9 5 #> 5 5 8 #> 6 5.1 6 #> 7 5.2 4 #> 8 5.4 4 #> 9 5.5 6 #> 10 5.6 5 #> # ... with 15 more rows伪运算符从rlang 0.4执行此操作的方法：

qazxswpoi

将参数传递给dplyr函数

问题描述投票：47回答：6

6个回答

最新问题

将参数传递给dplyr函数

问题描述 投票：47回答：6

6个回答

最新问题

问题描述投票：47回答：6