替换另一列中给出的单词后的所有内容

问题描述 投票:0回答:3

我有一个看起来像这样的数据框:

字符串
美味的红苹果1号 苹果
美味的红苹果和香蕉 苹果
美味的香蕉、苹果和桃子 苹果
美味的香蕉和桃子 香蕉
美味的桃子和苹果 桃子

我想把Word一栏给出的词后面的词全部删除,留下这个词

字符串 之后
美味的红苹果1号 苹果 美味的红苹果
美味的红苹果和香蕉 苹果 美味的红苹果
美味的香蕉、苹果和桃子 苹果 美味的香蕉和苹果
美味的香蕉和桃子 香蕉 美味的香蕉
美味的桃子和苹果 桃子 好吃的桃子

有人知道怎么做吗?

string <- с("tasty red apple number 1", "tasty red apple and banana", "tasty banana and apple and peach", "tasty banana and peach", "tasty peach and apple")
word <- c("apple", "apple", "apple", "banana", "peach")
r stringr
3个回答
0
投票

我们可以捕获字符 (

(...)
) 直到 'Word' 作为一个组,然后在
\\1
(
replacement
) 中使用捕获组的反向引用 (
str_replace
)。
.*
表示我们要丢弃的其余字符。
str_replace
也被向量化以进行替换,所以我们不需要任何循环

library(dplyr)
library(stringr)
df1 %>%
   mutate(After = str_replace(String, sprintf("(.*%s).*", Word), "\\1"))

-输出

                          String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

数据

df1 <- structure(list(String = c("tasty red apple number 1",
 "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

0
投票

gsub
中使用lookbehind和
mapply
来删除不需要的字符串部分。

transform(dat, After=mapply(\(x, y) gsub(sprintf('(?<=%s).*',  x), '', y, perl=TRUE), Word, String))
#                             String   Word                  After
# 1         tasty red apple number 1  apple        tasty red apple
# 2       tasty red apple and banana  apple        tasty red apple
# 3 tasty banana and apple and peach  apple tasty banana and apple
# 4           tasty banana and peach banana           tasty banana
# 5            tasty peach and apple  peach            tasty peach

资料:

dat <- structure(list(String = c("tasty red apple number 1", "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

0
投票

试试这个:

df1 %>%
  mutate(After = str_replace(String, str_c("(.*\\b", Word, "\\b).*"), "\\1"))
                            String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

在这里,我们 (i) 将

Word
包装到单词边界
\\b
中,以防止包含
Word
值的较大单词(例如,“dapple”和“apple”)被匹配。然后 (ii) 我们将该子字符串括起来以将其强制转换为捕获组,然后我们 (iii) 在
str_replace
替换参数中引用它,而捕获组 (
.*
) 之后的任何内容都将被省略。

© www.soinside.com 2019 - 2024. All rights reserved.