R data.table 中 by 的逆选择

问题描述 投票:0回答:2

我有以下数据.表:

DT <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))

对于

id
condition
的每个值,我想计算该条件的平均值,但对于当前值以外的所有 id 值。我目前的解决方案是:

DT[,meanothers:=DT[id!=ID & condition==CONDITION,mean(value)],by=.(ID=id,CONDITION=condition)]

data.table 中是否有更快或更节省内存的解决方案来解决此问题?

r data.table
2个回答
3
投票

使用我的一个旧答案中的想法:

DT[, c("Sum", "N") := .(sum(value), .N), by = condition]
DT[, c("sum", "n") := .(sum(value), .N), by = .(id, condition)]
DT[, meanothers1 := (Sum - sum) / (N - n)]
#        id condition value meanothers   Sum     N   sum     n meanothers1
#    <char>     <num> <num>      <num> <num> <int> <num> <int>       <num>
# 1:      A         1     0        1.5     3     3     0     1         1.5
# 2:      A         2     1        2.5     6     3     1     1         2.5
# 3:      B         1     1        1.0     3     3     1     1         1.0
# 4:      B         2     3        1.5     6     3     3     1         1.5
# 5:      C         1     2        0.5     3     3     2     1         0.5
# 6:      C         2     2        2.0     6     3     2     1         2.0

0
投票
library(data.table)

dt <- data.table(id=c("A","A","B","B","C","C"),condition=c(1,2,1,2,1,2),value=c(0,1,1,3,2,2))

dt[, meanothers := mapply(
  FUN = function(x, y) weighted.mean(value, w = (x != id) * (y == condition)),
  x = id,
  y = condition
)][]
#>    id condition value meanothers
#> 1:  A         1     0        1.5
#> 2:  A         2     1        2.5
#> 3:  B         1     1        1.0
#> 4:  B         2     3        1.5
#> 5:  C         1     2        0.5
#> 6:  C         2     2        2.0

创建于 2024-03-06,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.