我有以下数据.表:
DT <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))
对于
id
和 condition
的每个值,我想计算该条件的平均值,但对于当前值以外的所有 id 值。我目前的解决方案是:
DT[,meanothers:=DT[id!=ID & condition==CONDITION,mean(value)],by=.(ID=id,CONDITION=condition)]
data.table 中是否有更快或更节省内存的解决方案来解决此问题?
使用我的一个旧答案中的想法:
DT[, c("Sum", "N") := .(sum(value), .N), by = condition]
DT[, c("sum", "n") := .(sum(value), .N), by = .(id, condition)]
DT[, meanothers1 := (Sum - sum) / (N - n)]
# id condition value meanothers Sum N sum n meanothers1
# <char> <num> <num> <num> <num> <int> <num> <int> <num>
# 1: A 1 0 1.5 3 3 0 1 1.5
# 2: A 2 1 2.5 6 3 1 1 2.5
# 3: B 1 1 1.0 3 3 1 1 1.0
# 4: B 2 3 1.5 6 3 3 1 1.5
# 5: C 1 2 0.5 3 3 2 1 0.5
# 6: C 2 2 2.0 6 3 2 1 2.0
library(data.table)
dt <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))
dt[, meanothers := mapply(
FUN = function(x, y) weighted.mean(value, w = (x != id) * (y == condition)),
x = id,
y = condition
)][]
#> id condition value meanothers
#> 1: A 1 0 1.5
#> 2: A 2 1 2.5
#> 3: B 1 1 1.0
#> 4: B 2 3 1.5
#> 5: C 1 2 0.5
#> 6: C 2 2 2.0
创建于 2024-03-06,使用 reprex v2.0.2