我的目标是将多个函数应用于多个列AND以打开GForce。
假设我有以下数据框
library(data.table)
df <- data.table(fruit = c('a', 'a', 'a', 'b')
, revenue = 1:4
, profit = c(2,NA,4,5)
); df
fruit revenue profit
1: a 1 2
2: a 2 NA
3: a 3 4
4: b 4 5
并且我想将多个函数应用于多个列(除了
fruit
之外的所有列)
# functions
y <- \(i) {c(min(i, na.rm = T)
, max(i, na.rm = T)
)
}
# apply
df[, lapply(.SD, y)
, fruit
, verbose = T
]
Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization changed j from 'lapply(.SD, y)' to 'list(y(revenue), y(profit))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
memcpy contiguous groups took 0.000s for 2 groups
eval(j) took 0.012s for 2 calls
0.020s elapsed (0.020s cpu)
fruit revenue profit
1: a 1 2
2: a 3 4
3: b 4 5
4: b 4 5
现在,上面的方法就可以了! 但是,请注意它说的是
(GForce FALSE)
。所以 GForce NOT 已开启。
我认为这是因为,正如 Waldi 指出的那样,当使用
\(i) sum(i)
时,GForce NOT 开启。
然后我尝试了下面的方法并仅在 na.rm = T
中通过了
lapply
# functions
z <- \(i) {c(min
, max
)
}
# apply
df[, lapply(.SD, z, na.rm = T)
, fruit
, verbose = T
]
Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization changed j from 'lapply(.SD, z, na.rm = T)' to 'list(z(revenue, na.rm = T), z(profit, na.rm = T))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... Error in z(revenue, na.rm = T) : unused argument (na.rm = T)
这次错误如上。具体来说
Error in z(revenue, na.rm = T) : unused argument (na.rm = T)
任何帮助将不胜感激
我可以给出的唯一相对简单的建议是不要尝试在单个
df[]
调用中执行此操作,而是进行两个单独的调用以使优化发挥作用。例如:
## bigger data example
df <- data.table(
fruit = rep(1:2e6, each=2)
, revenue = 1:4
, profit = c(2,NA,4,5)
)
rbind(
df[, lapply(.SD, min, na.rm=TRUE), by=fruit, verbose=TRUE],
df[, lapply(.SD, max, na.rm=TRUE), by=fruit, verbose=TRUE]
)[order(fruit)]
##Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.008
##Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.002
##0.060s elapsed (0.050s cpu)
y <- function(i) {
c(min(i, na.rm = T),
max(i, na.rm = T))
}
# apply
df[
, lapply(.SD, y)
, fruit
, verbose = T
]
##Making each group and running j (GForce FALSE) ...
##3.760s elapsed (3.770s cpu)