[使用data.table按组排列变量

问题描述 投票:0回答:1

我正在尝试将一个变量排在另外两个变量的组中。我在frank中使用data.table。我似乎无法使by参数按我期望的方式工作

这是我的数据:

structure(list(indpn = c(170, 170, 170, 170, 170, 170, 9870, 
9870, 9870, 9870, 9870, 9870), occpn = c(6050, 9130, 205, 5120, 
5740, 6005, 3930, 700, 1410, 3645, 1050, 150), ncwc = c(258575, 
4747, 10742, 205, 867, 11026, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-12L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000181ef0>)

这是我正在使用的代码

z[ , therank := frank( -ncwc , ties.method ="min" ) , by = .(indpn, occpn) ]

这是我收到的:

    indpn occpn   ncwc therank
 1:   170  6050 258575       1
 2:   170  9130   4747       1
 3:   170   205  10742       1
 4:   170  5120    205       1
 5:   170  5740    867       1
 6:   170  6005  11026       1
 7:  9870  3930      0       1
 8:  9870   700      0       1
 9:  9870  1410      0       1
10:  9870  3645      0       1
11:  9870  1050      0       1
12:  9870   150      0       1

我希望therank变量返回1,4,3,6,5,2,1,1,1,1,1,1,

r data.table rank
1个回答
2
投票

如仅按indpn进行的注释分组中所述,将提供预期输出

library(data.table)
z[ , therank := frank(-ncwc , ties.method ="min" ) ,indpn]
z

#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150      0       1

但是请注意frank的行为。您正在寻找此输出吗?

z$ncwc[12] <- -1
z[ , therank := frank( -ncwc , ties.method ="min" ) ,indpn]
z
#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150     -1       6

如果期望最后一个值为2而不是6,则可以将matchunique结合使用

z[order(-ncwc) , therank := match(ncwc, unique(ncwc)) ,indpn]
z
#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150     -1       2
© www.soinside.com 2019 - 2024. All rights reserved.