我有一个看起来像这样的数据框:
字符串 | 字 |
---|---|
美味的红苹果1号 | 苹果 |
美味的红苹果和香蕉 | 苹果 |
美味的香蕉、苹果和桃子 | 苹果 |
美味的香蕉和桃子 | 香蕉 |
美味的桃子和苹果 | 桃子 |
我想把Word一栏给出的词后面的词全部删除,留下这个词
字符串 | 字 | 之后 |
---|---|---|
美味的红苹果1号 | 苹果 | 美味的红苹果 |
美味的红苹果和香蕉 | 苹果 | 美味的红苹果 |
美味的香蕉、苹果和桃子 | 苹果 | 美味的香蕉和苹果 |
美味的香蕉和桃子 | 香蕉 | 美味的香蕉 |
美味的桃子和苹果 | 桃子 | 好吃的桃子 |
有人知道怎么做吗?
string <- с("tasty red apple number 1", "tasty red apple and banana", "tasty banana and apple and peach", "tasty banana and peach", "tasty peach and apple")
word <- c("apple", "apple", "apple", "banana", "peach")
我们可以捕获字符 (
(...)
) 直到 'Word' 作为一个组,然后在 \\1
(replacement
) 中使用捕获组的反向引用 (str_replace
)。 .*
表示我们要丢弃的其余字符。 str_replace
也被向量化以进行替换,所以我们不需要任何循环
library(dplyr)
library(stringr)
df1 %>%
mutate(After = str_replace(String, sprintf("(.*%s).*", Word), "\\1"))
-输出
String Word After
1 tasty red apple number 1 apple tasty red apple
2 tasty red apple and banana apple tasty red apple
3 tasty banana and apple and peach apple tasty banana and apple
4 tasty banana and peach banana tasty banana
5 tasty peach and apple peach tasty peach
df1 <- structure(list(String = c("tasty red apple number 1",
"tasty red apple and banana",
"tasty banana and apple and peach", "tasty banana and peach",
"tasty peach and apple"), Word = c("apple", "apple", "apple",
"banana", "peach")), class = "data.frame", row.names = c(NA,
-5L))
在
gsub
中使用lookbehind和mapply
来删除不需要的字符串部分。
transform(dat, After=mapply(\(x, y) gsub(sprintf('(?<=%s).*', x), '', y, perl=TRUE), Word, String))
# String Word After
# 1 tasty red apple number 1 apple tasty red apple
# 2 tasty red apple and banana apple tasty red apple
# 3 tasty banana and apple and peach apple tasty banana and apple
# 4 tasty banana and peach banana tasty banana
# 5 tasty peach and apple peach tasty peach
资料:
dat <- structure(list(String = c("tasty red apple number 1", "tasty red apple and banana",
"tasty banana and apple and peach", "tasty banana and peach",
"tasty peach and apple"), Word = c("apple", "apple", "apple",
"banana", "peach")), class = "data.frame", row.names = c(NA,
-5L))
试试这个:
df1 %>%
mutate(After = str_replace(String, str_c("(.*\\b", Word, "\\b).*"), "\\1"))
String Word After
1 tasty red apple number 1 apple tasty red apple
2 tasty red apple and banana apple tasty red apple
3 tasty banana and apple and peach apple tasty banana and apple
4 tasty banana and peach banana tasty banana
5 tasty peach and apple peach tasty peach
在这里,我们 (i) 将
Word
包装到单词边界 \\b
中,以防止包含 Word
值的较大单词(例如,“dapple”和“apple”)被匹配。然后 (ii) 我们将该子字符串括起来以将其强制转换为捕获组,然后我们 (iii) 在 str_replace
替换参数中引用它,而捕获组 (.*
) 之后的任何内容都将被省略。