将数据变量传递给 R 公式

Question

假设我想写

anscombe %>% lm_tidy("x1", "y1")

（实际上，我想写
anscombe %>% lm_tidy(x1, y1)
，其中
x1
和
y1
是数据框的一部分）。因此，由于以下功能似乎有效：

plot_gg <- function(df, x, y) {
  x <- enquo(x)
  y <- enquo(y)
  ggplot(df, aes(x = !!x, y = !!y)) + geom_point() +
    geom_smooth(formula = y ~ x, method="lm", se = FALSE)
}

我开始编写以下函数：

lm_tidy_1 <- function(df, x, y) {
  x <- enquo(x)
  y <- enquo(y)
  fm <- y ~ x            ##### I tried many stuff here!
  lm(fm, data=df)
}
## Error in model.frame.default(formula = fm, data = df, drop.unused.levels = TRUE) : 
##   object is not a matrix

passing in column name as argument 中的一条评论指出，

embrace {{...}}

是引号-反引号模式的简写符号。不幸的是，两种情况下的错误消息都不同：

lm_tidy_2 <- function(df, x, y) {
  fm <- !!enquo(y) ~ !!enquo(x) # alternative: {{y}} ~ {{x}} with different errors!!
  lm(fm, data=df)
}
## Error:
## ! Quosures can only be unquoted within a quasiquotation context.

这似乎有效（基于 @jubas 的回答但我们坚持使用字符串处理和

paste

）：

lm_tidy_str <- function(df, x, y) {
  fm <- formula(paste({{y}}, "~", {{x}}))
  lm(fm, data=df)
}

又一次，

{{y}} != !!enquo(y)

。但更糟糕的是：下面的函数出现了与之前相同的

Quosure

错误：

lm_tidy_str_1 <- function(df, x, y) {
  x <- enquo(x)
  y <- enquo(y)
  fm <- formula(paste(!!y, "~", !!x))
  lm(fm, data=df)
}

是
```
{{y}} != !!enquo(y)
```
吗？
如何将数据变量传递给
```
lm
```
？

编辑：对不起，我的许多试验都有遗留问题。我想直接将数据变量（例如

x1

和

y1

）传递给将它们用作公式组件（例如

lm

）的函数，而不是它们的字符串版本（

"x1"

和

"y1"

): 我尽量避免使用字符串，从用户的角度来看更精简。

Answer 1

考虑：

lm_tidy_1 <- function(df, x, y) {
  fm <- reformulate(as.character(substitute(x)), substitute(y))
  lm(fm, data=df)
}

lm_tidy_1(iris, Species, Sepal.Length)
lm_tidy_1(iris, 'Species', Sepal.Length)
lm_tidy_1(iris, Species, 'Sepal.Length')
lm_tidy_1(iris, 'Species', 'Sepal.Length')

编辑：

如果需要公式出现，更改调用对象：

lm_tidy_1 <- function(df, x, y) { 
   fm <- reformulate(as.character(substitute(x)), substitute(y)) 
   res<-lm(fm, data=df) 
   res$call[[2]]<- fm
   res
}

lm_tidy_1(iris, Species, Sepal.Length) 

Call:
lm(formula = Sepal.Length ~ Species, data = df)

Coefficients:
      (Intercept)  Speciesversicolor   Speciesvirginica  
            5.006              0.930              1.582

Answer 2

@BiranSzydek 的回答很不错。然而它有 3 个缺点：

Call:
lm(formula = fm, data = .)

看不到实际使用的公式和数据。
必须将符号作为字符串输入。
来自
```
rlang
```
的依赖——尽管它是一个很棒的包。

你确实可以用纯基 R 解决这个问题！

纯碱R中的解

R 实际上是 Lisp 的底层。它适用于此类元编程任务。 R 的唯一缺点是它可怕的语法。特别是在面对元编程时，它不像 Lisp 语言那样漂亮和优雅。语法确实会造成很多混淆 - 正如您在尝试解决此问题时亲身经历的那样。

解决方案是使用

substitute()

，您可以通过引用的方式替换代码片段：

lm_tidy <- function(df, x, y) {
  # take the arguments as code pieces instead to evaluate them:
  .x <- substitute(x)
  .y <- substitute(y)
  .df <- substitute(df)
  # take the code piece `y ~ x` and substitute using list lookup table
  .fm <- substitute(y ~ x, list(y=.y, x=.x))
  # take the code `lm(fm, data=df)` and substitute with the code pieceses defined by the lookup table
  # by replacing them by the code pieces stored in `.fm` and `.df`
  # and finally: evaluate the substituted code in the parent environment (the environment where the function was called!)
  eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}

诀窍是使用

eval.parent(substitute( <your expression>, <a list which determines the evaluation lookup-table for the variables in your expression>))

.

谨防范围界定！只要

<your expression>

仅使用在函数内部定义的变量或在提供给

substitute()

的查找列表中构建，就不会有任何范围界定问题！但避免在

<your expression>

中引用任何其他变量！ - 因此，这是您在这种情况下安全使用

eval()

/

eval.parent()

必须遵守的唯一规则！但即使

eval.parent()

小心，替换代码在调用此函数的环境中执行。

现在，您可以：

lm_tidy(mtcars, cyl, mpg)

现在输出如愿：

Call:
lm(formula = mpg ~ cyl, data = mtcars)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876

我们用纯碱基 R 做到了这一点！

安全使用

eval()

的诀窍实际上是

substitute()

表达式中的每个变量都在

substitute()

或函数参数的查找表中定义/给出。换句话说：没有一个被替换的变量引用函数定义之外的任何悬空变量。

plot_gg

功能

因此，按照这些规则，您的

plot_gg

函数将定义为：

plot_gg <- function(df, x, y) {
  .x <- substitute(x)
  .y <- substitute(y)
  .df <- substitute(df)
  .fm <- substitute( y ~ x, list(x=.x, y=.y))
  eval.parent(substitute(
    ggplot(df, aes(x=x, y=y)) + geom_point() +
      geom_smooth(formula = fm, method="lm", se=FALSE),
    list(fm=.fm, x=.x, y=.y, df=.df)
  ))
}

当你想输入

和

作为字符串时


lm_tidy_str <- function(df, x, y) {
  .x <- as.name(x)
  .y <- as.name(y)
  .df <- substitute(df)
  .fm <- substitute(y ~ x, list(y=.y, x=.x))
  eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}

plot_gg_str <- function(df, x, y) {
  .x <- as.name(x)
  .y <- as.name(y)
  .df <- substitute(df)
  .fm <- substitute( y ~ x, list(x=.x, y=.y))
  eval.parent(substitute(
    ggplot(df, aes(x=x, y=y)) + geom_point() +
      geom_smooth(formula = fm, method="lm", se=FALSE),
    list(fm=.fm, x=.x, y=.y, df=.df)
  ))
}

lm_tidy_str(mtcars, "cyl", "mpg")

# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
# 
# Coefficients:
# (Intercept)          cyl  
#      37.885       -2.876  
# 

require(ggplot2)
plot_gg_str(mtcars, "cyl", "mpg")

Answer 3

将公式包裹在“expr”中，然后计算它。

library(dplyr)
lm_tidy <- function(df, x, y) {
  x <- sym(x)
  y <- sym(y)
  fm <- expr(!!y ~ !!x)
  lm(fm, data = df)
}

这个函数等价于：

lm_tidy <- function(df, x, y) {
  fm <- expr(!!sym(y) ~ !!sym(x))
  lm(fm, data = df)
}

然后

lm_tidy(mtcars, "cyl", "mpg")

给予

Call:
lm(formula = fm, data = .)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876

编辑以下评论：

library(rlang)
lm_tidy_quo <- function(df, x, y){
    y <- enquo(y)
    x <- enquo(x)
    fm <- paste(quo_text(y), "~", quo_text(x))
    lm(fm, data = df)
}

然后您可以将符号作为参数传递

lm_tidy_quo(mtcars, cyl, mpg)

将数据变量传递给 R 公式

问题描述投票：0回答：3

3个回答

纯碱R中的解

`plot_gg`
功能

当你想输入
`x`
和
`y`
作为字符串时

最新问题

将数据变量传递给 R 公式

问题描述 投票：0回答：3

3个回答

纯碱R中的解

plot_gg功能

当你想输入x和y作为字符串时

最新问题

问题描述投票：0回答：3

`plot_gg`
功能

当你想输入
`x`
和
`y`
作为字符串时