如果在一组行中重复,则从字符串中删除单词

问题描述 投票:0回答:1

我有一个表格,在第一个单元格中至少有两个字符串,我需要从中进行选择,并且只保留较长字符串中的一个。

  library(qdap)
  library(magrittr)
t<-  read.table(text="
V1,V2
  Video of Animals-ElephantRhino,Elephant
  Video of Animals-ElephantRhino,Rhino
  Audio at loud volume-SirensHornsCrickets,Sirens
  Audio at loud volume-SirensHornsCrickets,Horns
  Audio at loud volume-SirensHornsCrickets,Crickets
",
header=T,sep = ",")

因此,对于这个例子,我想从 V1 的第一行中删除“Rhino”,并从 V1 的第二行中删除“Elephant”。

这是我尝试过的:

t%>%  
  split(.,.$V1)%>% 
    lapply(.,function(x){(unique(x$V2))})%>%
      lapply(.,function(y){mgsub(pattern=y[[1]],replacement="",names(y))})

此尝试既不会更改较长的字符串,也不会保留独特的较小字符串。

答案应该是这样的:

read.table(text="
V1,V2
  Video of Animals-Elephant,Elephant
  Video of Animals-Rhino,Rhino
  Audio at loud volume-Sirens,Sirens
  Audio at loud volume-Horns,Horns
  Audio at loud volume-Crickets,Crickets
",header=T,sep = ",")
r lapply gsub
1个回答
0
投票

也许是这个?

mapply(sub, paste0("\\b[A-Za-z]*(", dat$V2, ")[A-Za-z]*\\b"), "\\1", dat$V1)
# \\b[A-Za-z]*(Elephant)[A-Za-z]*\\b    \\b[A-Za-z]*(Rhino)[A-Za-z]*\\b   \\b[A-Za-z]*(Sirens)[A-Za-z]*\\b    \\b[A-Za-z]*(Horns)[A-Za-z]*\\b 
#      "  Video of Animals-Elephant"         "  Video of Animals-Rhino"    "  Audio at loud volume-Sirens"     "  Audio at loud volume-Horns" 
# \\b[A-Za-z]*(Crickets)[A-Za-z]*\\b 
#  "  Audio at loud volume-Crickets" 

数据:

dat <- structure(list(V1 = c("  Video of Animals-ElephantRhino", "  Video of Animals-ElephantRhino", "  Audio at loud volume-SirensHornsCrickets", "  Audio at loud volume-SirensHornsCrickets", "  Audio at loud volume-SirensHornsCrickets"), V2 = c("Elephant", "Rhino", "Sirens", "Horns", "Crickets")), class = "data.frame", row.names = c(NA, -5L))
© www.soinside.com 2019 - 2024. All rights reserved.