Max by Group with Condition for data.table

问题描述 投票:1回答:2

我有这样的数据:

library(data.table)
group <- c("a","a","a","b","b","b")
cond <- c("N","Y","N","Y","Y","N")
value <- c(2,1,3,4,2,5)

dt <- data.table(group, cond, value)

group cond value
a     N    2
a     Y    1
a     N    3
b     Y    4
b     Y    2
b     N    5

当cond为整个组的Y时,我想返回最大值。像这样的东西:

group cond value max
a     N    2     1
a     Y    1     1
a     N    3     1
b     Y    4     4
b     Y    2     4
b     N    5     4

我已经尝试将ifelse条件添加到分组最大值,但是,当行不符合条件时,我最终只返回NA的无条件:

dt[, max := ifelse(cond=="Y", max(value), NA), by = group]
r data.table
2个回答
2
投票

假设对于每个'组',我们需要获得'value'的max,其中'cond'为“Y”,在按'group'分组后,将'value'与逻辑条件(cond == 'Y')进行子集化并得到max

dt[, max := max(value[cond == 'Y']), by = group]
dt
#   group cond value max
#1:     a    N     2   1
#2:     a    Y     1   1
#3:     a    N     3   1
#4:     b    Y     4   4
#5:     b    Y     2   4
#6:     b    N     5   4

2
投票

你可以......

dt[CJ(group = group, cond = "Y", unique=TRUE), on=.(group, cond), 
  .(mv = max(value))
, by=.EACHI]

#    group cond mv
# 1:     a    Y  1
# 2:     b    Y  4

使用这样的连接will eventually have优化max计算。


另一种方式(最初包含在@ akrun的答案中):

dt[cond == "Y", mv := max(value), by=group]

从前面的链接,我们可以看到这种方式已经优化,除了the := part

© www.soinside.com 2019 - 2024. All rights reserved.