我有一个表格,在第一个单元格中至少有两个字符串,我需要从中进行选择,并且只保留较长字符串中的一个。
library(qdap)
library(magrittr)
t<- read.table(text="
V1,V2
Video of Animals-ElephantRhino,Elephant
Video of Animals-ElephantRhino,Rhino
Audio at loud volume-SirensHornsCrickets,Sirens
Audio at loud volume-SirensHornsCrickets,Horns
Audio at loud volume-SirensHornsCrickets,Crickets
",
header=T,sep = ",")
因此,对于这个例子,我想从 V1 的第一行中删除“Rhino”,并从 V1 的第二行中删除“Elephant”。
这是我尝试过的:
t%>%
split(.,.$V1)%>%
lapply(.,function(x){(unique(x$V2))})%>%
lapply(.,function(y){mgsub(pattern=y[[1]],replacement="",names(y))})
此尝试既不会更改较长的字符串,也不会保留独特的较小字符串。
答案应该是这样的:
read.table(text="
V1,V2
Video of Animals-Elephant,Elephant
Video of Animals-Rhino,Rhino
Audio at loud volume-Sirens,Sirens
Audio at loud volume-Horns,Horns
Audio at loud volume-Crickets,Crickets
",header=T,sep = ",")
也许是这个?
mapply(sub, paste0("\\b[A-Za-z]*(", dat$V2, ")[A-Za-z]*\\b"), "\\1", dat$V1)
# \\b[A-Za-z]*(Elephant)[A-Za-z]*\\b \\b[A-Za-z]*(Rhino)[A-Za-z]*\\b \\b[A-Za-z]*(Sirens)[A-Za-z]*\\b \\b[A-Za-z]*(Horns)[A-Za-z]*\\b
# " Video of Animals-Elephant" " Video of Animals-Rhino" " Audio at loud volume-Sirens" " Audio at loud volume-Horns"
# \\b[A-Za-z]*(Crickets)[A-Za-z]*\\b
# " Audio at loud volume-Crickets"
数据:
dat <- structure(list(V1 = c(" Video of Animals-ElephantRhino", " Video of Animals-ElephantRhino", " Audio at loud volume-SirensHornsCrickets", " Audio at loud volume-SirensHornsCrickets", " Audio at loud volume-SirensHornsCrickets"), V2 = c("Elephant", "Rhino", "Sirens", "Horns", "Crickets")), class = "data.frame", row.names = c(NA, -5L))