使用命名列表和`:=`在 R `data.table` 中设置多个列

Question

使用

:=

创建新列是我最喜欢的 data.table 功能之一。我知道有两种使用它一次添加多个列的方法。这是一个简单的例子

dt <- data.table("widths" = seq(2, 10, 2), "heights" = 8:4)
dt
   widths heights
1:      2       8
2:      4       7
3:      6       6
4:      8       5
5:     10       4

假设我想添加两列，一列用于面积，另一列用于周长。第一种方法是调用，例如

new_cols <- c("areas", "perimeters")

my_fun <- function(x, y){
  areas <- x * y
  perimeters <- 2*(x + y)
  return(list(areas = areas, perimeters = perimeters))
}

dt[ , (new_cols) := my_fun(widths, heights)]
dt
   widths heights areas perimeters
1:      2       8   16        20
2:      4       7   28        22
3:      6       6   36        24
4:      8       5   40        26
5:     10       4   40        28

同样，我们可以使用

:=

的函数形式，如下所示：

dt[ , `:=`("areas" = widths * heights, "perimeters" = 2*(widths + heights))]

这两种方法都需要提前输入新列的名称。您可以手动输入它们，可以在创建列之前将它们保存在对象中，或者您可以在

:=

的左侧有一个生成名称的函数。我不知道的是一种在一次调用中同时获取名称和输出到

:=

的方法。

有办法做到这一点吗？这是我希望做的一个例子：

dt[ , (new_cols) := NULL] # delete the previously added area and perimeter cols.
dt[ , `:=`(my_fun(widths, heights))]
dt
   widths heights areas perimeters
1:      2       8   16        20
2:      4       7   28        22
3:      6       6   36        24
4:      8       5   40        26
5:     10       4   40        28

理想情况下，有一种方法可以让

:=

看到

my_fun()

返回名称，然后使用这些名称作为新列的名称。我知道上面的内容会产生错误，但我想知道是否有一种简单的方法来获得所需的功能，因为这对于有许多列或列名称取决于函数输入的较大问题很有用。

编辑：我正在寻找的关键是一种通过引用分配这些列的方法，即使用

:=

或 set()，并且我还想将输出的类维护为

data.table

。

Answer 1

评论太长。不漂亮:

dt[, {
    a <- my_fun(widths, heights)   
    for (x in names(a))
        set(dt, j=x, value=a[[x]])
}]

或者如果函数是由您创建的，您可以将

dt

传递到该函数中吗？

Answer 2

我不认为你正在寻找这个，但这确实有效。

data.frame(dt, my_fun(dt$widths, dt$heights))

#  widths heights areas perimeters
#1      2       8    16         20
#2      4       7    28         22
#3      6       6    36         24
#4      8       5    40         26
#5     10       4    40         28

不幸的是，

data.table(dt, my_fun(dt$widths, dt$heights))

不起作用。

Answer 3

在 #1543 解决之前，如果您不受执行时间/内存的限制，您可以考虑使用

cbind

并重新分配：

library(data.table)
dt <- data.table("widths" = seq(2, 10, 2), "heights" = 8:4)

my_fun <- function(x, y){
  areas <- x * y
  perimeters <- 2*(x + y)
  return(list(areas = areas, perimeters = perimeters))
}

dt <- dt[, cbind(.SD, as.data.table(my_fun(widths, heights)))]
dt
#>    widths heights areas perimeters
#>     <num>   <int> <num>      <num>
#> 1:      2       8    16         20
#> 2:      4       7    28         22
#> 3:      6       6    36         24
#> 4:      8       5    40         26
#> 5:     10       4    40         28

dt <- dt[, cbind(.SD, better.name = as.data.table(my_fun(widths, heights)))]
dt
#>    widths heights areas perimeters better.name.areas better.name.perimeters
#>     <num>   <int> <num>      <num>             <num>                  <num>
#> 1:      2       8    16         20                16                     20
#> 2:      4       7    28         22                28                     22
#> 3:      6       6    36         24                36                     24
#> 4:      8       5    40         26                40                     26
#> 5:     10       4    40         28                40                     28

这将复制整个 data.table，也不会更新具有相同名称的现有列，但另一方面允许您为所有变量添加前缀。

使用命名列表和`:=`在 R `data.table` 中设置多个列

问题描述投票：0回答：3

3个回答

最新问题

使用命名列表和`:=`在 R `data.table` 中设置多个列

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3