替换另一列中给出的单词后的所有内容

Question

我有一个看起来像这样的数据框：

字符串	字
美味的红苹果1号	苹果
美味的红苹果和香蕉	苹果
美味的香蕉、苹果和桃子	苹果
美味的香蕉和桃子	香蕉
美味的桃子和苹果	桃子

我想把Word一栏给出的词后面的词全部删除，留下这个词

字符串	字	之后
美味的红苹果1号	苹果	美味的红苹果
美味的红苹果和香蕉	苹果	美味的红苹果
美味的香蕉、苹果和桃子	苹果	美味的香蕉和苹果
美味的香蕉和桃子	香蕉	美味的香蕉
美味的桃子和苹果	桃子	好吃的桃子

有人知道怎么做吗？

string <- с("tasty red apple number 1", "tasty red apple and banana", "tasty banana and apple and peach", "tasty banana and peach", "tasty peach and apple")
word <- c("apple", "apple", "apple", "banana", "peach")

Answer 1

我们可以捕获字符 (

(...)

) 直到 'Word' 作为一个组，然后在

\\1

(

replacement

) 中使用捕获组的反向引用 (

str_replace

)。

.*

表示我们要丢弃的其余字符。

str_replace

也被向量化以进行替换，所以我们不需要任何循环

library(dplyr)
library(stringr)
df1 %>%
   mutate(After = str_replace(String, sprintf("(.*%s).*", Word), "\\1"))

-输出

                          String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

数据

df1 <- structure(list(String = c("tasty red apple number 1",
 "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

Answer 2

在

gsub

中使用lookbehind和

mapply

来删除不需要的字符串部分。

transform(dat, After=mapply(\(x, y) gsub(sprintf('(?<=%s).*',  x), '', y, perl=TRUE), Word, String))
#                             String   Word                  After
# 1         tasty red apple number 1  apple        tasty red apple
# 2       tasty red apple and banana  apple        tasty red apple
# 3 tasty banana and apple and peach  apple tasty banana and apple
# 4           tasty banana and peach banana           tasty banana
# 5            tasty peach and apple  peach            tasty peach

资料：

dat <- structure(list(String = c("tasty red apple number 1", "tasty red apple and banana", 
"tasty banana and apple and peach", "tasty banana and peach", 
"tasty peach and apple"), Word = c("apple", "apple", "apple", 
"banana", "peach")), class = "data.frame", row.names = c(NA, 
-5L))

Answer 3

试试这个：

df1 %>%
  mutate(After = str_replace(String, str_c("(.*\\b", Word, "\\b).*"), "\\1"))
                            String   Word                  After
1         tasty red apple number 1  apple        tasty red apple
2       tasty red apple and banana  apple        tasty red apple
3 tasty banana and apple and peach  apple tasty banana and apple
4           tasty banana and peach banana           tasty banana
5            tasty peach and apple  peach            tasty peach

在这里，我们 (i) 将

Word

包装到单词边界

\\b

中，以防止包含

Word

值的较大单词（例如，“dapple”和“apple”）被匹配。然后 (ii) 我们将该子字符串括起来以将其强制转换为捕获组，然后我们 (iii) 在

str_replace

替换参数中引用它，而捕获组 (

.*

) 之后的任何内容都将被省略。

替换另一列中给出的单词后的所有内容

问题描述投票：0回答：3

3个回答

数据

最新问题

替换另一列中给出的单词后的所有内容

问题描述 投票：0回答：3

3个回答

数据

最新问题

问题描述投票：0回答：3