根据唯一值将多个data.table列粘贴到单个列中

问题描述 投票:0回答:1

我有一个看起来像这样的data.table:

require("data.table")

dt1 <- data.table(VAR1 = c("Brick","Sand","Concrete","Stone"), VAR2 = c(100,23,76,43), VAR3 = c("Place","Location","Place","Vista"), VAR4 = c("Place","Tree","Wood","Vista"), VAR5 = c("Place","Tree","Wood","Forest"))

我想按以下顺序将命名列(我的真实数据还有其他列)粘贴在一起:VAR2,VAR1,VAR3,VAR4和VAR5。但是,我有两个条件:

  • 同一行中的值不应该重复(当值重复时,最后一个条目的列应该被保留-因此在我的示例中VAR5中的'Place'将被保留)]
  • 在粘贴时,逗号应为分隔符,但VAR2和VAR1之间除外

我的预期输出将如下所示:

dt2 <- data.table(VAR6 = c("100 Brick, Place","23 Sand, Location, Tree","76 Concrete, Place, Wood","43 Stone, Vista, Forest"))
r string data.table paste
1个回答
0
投票

我们可以按do.call(paste的顺序选择列,然后使用正则表达式删除重复的单词,然后使用.SDcols

dt1[,  .(VAR6 = sub(",", " ", gsub("\\b(\\w+)\\b\\s*,\\s*(?=.*\\1)", "", 
      do.call(paste, c(.SD, sep=",")), perl = TRUE))), 
           .SDcols = names(dt1)[c(2:1, 3:5)]]
#                     VAR6
#1:        100 Brick,Place
#2:  23 Sand,Location,Tree
#3: 76 Concrete,Place,Wood
#4:  43 Stone,Vista,Forest

或按行顺序分组并执行paste

V6 <- dt1[, sprintf("%s %s, %s", VAR2, VAR1, 
   toString(unique(unlist(.SD)))), 1:nrow(dt1), .SDcols = VAR3:VAR5]$V1
data.table(V6)
#                     V6
#1:         100 Brick, Place
#2:  23 Sand, Location, Tree
#3: 76 Concrete, Place, Wood
#4:  43 Stone, Vista, Forest
© www.soinside.com 2019 - 2024. All rights reserved.