data.table 上所有行的按组列的条件平均值

问题描述 投票:0回答:1

假设我有以下数据

DT <- data.table(id = c(1,1,1,2,2,3,3),
                bank=c("a","b","b","b","b","a","b"),
                rate =c(2,3,4,1,0.5,7,3.2),
                balance=c(10,5,11,3,20,0.5,2),
                new=c(1,0,0,1,0,0,1),
                before=c(1,0,1,1,0,1,0))

    id bank rate balance new before
1:  1    a  2.0    10.0   1      1
2:  1    b  3.0     5.0   0      0
3:  1    b  4.0    11.0   0      1
4:  2    b  1.0     3.0   1      1
5:  2    b  0.5    20.0   0      0
6:  3    a  7.0     0.5   0      1
7:  3    b  3.2     2.0   1      0

我想在所有行中添加一列,其中包含银行新贷款的平均利率(新==1)。我能做的最好的事就是

aft <- DT[new==1,.(mrate=mean(rate)),by=bank]
aft <-  merge(DT,aft, by="bank", all.x = T)

   bank id rate balance new before mrate
1:    a  1  2.0    10.0   1      1   2.0
2:    a  3  7.0     0.5   0      1   2.0
3:    b  1  3.0     5.0   0      0   2.1
4:    b  1  4.0    11.0   0      1   2.1
5:    b  2  1.0     3.0   1      1   2.1
6:    b  2  0.5    20.0   0      0   2.1
7:    b  3  3.2     2.0   1      0   2.1

有没有办法避免合并步骤。任何帮助表示赞赏。谢谢。

data.table
1个回答
0
投票
DT[, mrate := mean(rate[new == 1]), by = .(bank)]
#       id   bank  rate balance   new before mrate
#    <num> <char> <num>   <num> <num>  <num> <num>
# 1:     1      a   2.0    10.0     1      1   2.0
# 2:     1      b   3.0     5.0     0      0   2.1
# 3:     1      b   4.0    11.0     0      1   2.1
# 4:     2      b   1.0     3.0     1      1   2.1
# 5:     2      b   0.5    20.0     0      0   2.1
# 6:     3      a   7.0     0.5     0      1   2.0
# 7:     3      b   3.2     2.0     1      0   2.1

如果银行没有

new == 1
行,那么您将得到
NaN
s:

DT[1,new:=0]
DT[, mrate := mean(rate[new == 1]), by = .(bank)]
#       id   bank  rate balance   new before mrate
#    <num> <char> <num>   <num> <num>  <num> <num>
# 1:     1      a   2.0    10.0     0      1   NaN
# 2:     1      b   3.0     5.0     0      0   2.1
# 3:     1      b   4.0    11.0     0      1   2.1
# 4:     2      b   1.0     3.0     1      1   2.1
# 5:     2      b   0.5    20.0     0      0   2.1
# 6:     3      a   7.0     0.5     0      1   NaN
# 7:     3      b   3.2     2.0     1      0   2.1

如果这是一个问题,您可以在平均值周围使用

fcoalesce
来指定默认值;我将在这里使用
-1
只是为了清楚我在说什么:

DT[, mrate := fcoalesce(mean(rate[new == 1]), -1), by = .(bank)]
#       id   bank  rate balance   new before mrate
#    <num> <char> <num>   <num> <num>  <num> <num>
# 1:     1      a   2.0    10.0     0      1  -1.0
# 2:     1      b   3.0     5.0     0      0   2.1
# 3:     1      b   4.0    11.0     0      1   2.1
# 4:     2      b   1.0     3.0     1      1   2.1
# 5:     2      b   0.5    20.0     0      0   2.1
# 6:     3      a   7.0     0.5     0      1  -1.0
# 7:     3      b   3.2     2.0     1      0   2.1
© www.soinside.com 2019 - 2024. All rights reserved.