这是示例数据:
df <- data.table(cake = c(1, 2, 3, 4, 5, 6, 7, "c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3", "c3"), walk = c(183, 789, 753, 130, 126, 44, 325, 710, 307, 264, 708, 769, 742, 559, 181, 138));
我希望在此 data.table 中添加一列
final
,仅当列 walk
中的相邻行条目是唯一的时,它等于列 cake
,但如果它不是唯一的,即有多个项目,则取最小值关闭所有值,仅显示顶部的值,其余可以设置为零。
例如蛋糕:最终 :: 1:183, 2:789,,, c1:264, c3:138...等等
理想情况下,这将是
final
列。
final=c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)
我已经尝试过这段代码,但它是错误的。
df[, is_unique := !duplicated(cake)]
df[, cake_count := .N, by = cake]
df[, min_walk := ifelse(duplicated(cake), min(walk), walk)]
df[, final := ifelse(is_unique, min_walk, 0)]
如果可以使用 data.table 包来完成,我将不胜感激。我相信 data.table 对于非常大的数据集效果更好。
专栏
cake
专栏
walk
如果将来我需要重复所有非唯一条目的最小值而不是将它们固定为零,还请给我代码。
一种可能的解决方案:
df[,minwalk:=min(walk),by="cake"][minwalk!=walk,walk:=0][,minwalk:=NULL][order(cake,-walk)]
#> cake walk
#> 1: 1 183
#> 2: 2 789
#> 3: 3 753
#> 4: 4 130
#> 5: 5 126
#> 6: 6 44
#> 7: 7 325
#> 8: c1 264
#> 9: c1 0
#> 10: c1 0
#> 11: c2 708
#> 12: c2 0
#> 13: c3 138
#> 14: c3 0
#> 15: c3 0
#> 16: c3 0
df$goal = c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)
df[, result := if(.N == 1) walk else c(min(walk), rep(0, .N - 1)), by = cake]
df
# cake walk goal result
# 1: 1 183 183 183
# 2: 2 789 789 789
# 3: 3 753 753 753
# 4: 4 130 130 130
# 5: 5 126 126 126
# 6: 6 44 44 44
# 7: 7 325 325 325
# 8: c1 710 264 264
# 9: c1 307 0 0
# 10: c1 264 0 0
# 11: c2 708 708 708
# 12: c2 769 0 0
# 13: c3 742 181 138
# 14: c3 559 0 0
# 15: c3 181 0 0
# 16: c3 138 0 0