考虑以下最小工作示例:
library(magrittr) # for the %>% pipe
library(data.table)
# test data.table contains common_column and two others
test_dt <- data.table(test_column_one = c(1, 2, 3), test_column_two = c("x","y","z"), common_column = c("ID1", "ID2", "ID3") )
# some other data.table that contains common_column
other_dt <- data.table( additional_info = c("US", "US", "GB"), common_column = c("ID1", "ID2", "ID3"))
example_function <- function(dt_column){
# does some things on the data tables based on the column parameter passed
merged_dt <- merge(other_dt, test_dt[,.(common_column, dt_column)], by = "common_column") %>%
.[order(dt_column),] # order by the dt_column
return(merged_dt)
}
# calling the example function
example_function(test_dt$test_column_one)
如何将代码修改为:
我想避免 for 循环并尽可能利用优化的 data.table 语法。
我尝试使用
unlist()
以及特定于 data.table 的 ..
语法,但不知怎的,我总是收到奇怪的错误消息,并且不确定如何继续。
您可以创建一个列向量以合并到函数中以与
..
语法一起使用。另外,当您想利用 data.table
效率时,请使用 set*
函数(在本例中为 setorderv()
)进行就地修改,而不是通过管道创建副本。
example_function <- function(dt_column, dt1 = other_dt, dt2 = test_dt) {
cols_to_merge <- c("common_column", dt_column)
merged_dt <- merge(
dt1,
dt2[, ..cols_to_merge],
by = "common_column"
)
setorderv(merged_dt, dt_column)
# order by the dt_column
return(merged_dt)
}
example_function("test_column_one")
# common_column additional_info test_column_one
# <char> <char> <num>
# 1: ID1 US 1
# 2: ID2 US 2
# 3: ID3 GB 3