从字符串列中的向量中查找第一个匹配的单词

问题描述 投票:0回答:1

我需要知道向量中的哪个单词在字符串中排在第一位。我需要在包含数百万条记录的大型数据帧上运行此代码。

df 是我的样本数据

df <- data.frame(ID = c(1,2,3),
Text = c("A basket of fruits having apples, green bananas, and peaches",
"A basket of fruits having green bananas, apples, and peaches",
"A basket of fruits having peaches, green bananas, and apples"))

我要匹配的单词位于向量中

vec <- c("green bananas", "apples", "peaches")

我想要每个记录都有一个结果列,如下所示

df$Result 
"apples", "green bananas", "peaches"
r regex string tidyverse
1个回答
0
投票

您可以使用

regmatches
+
regexpr
,如下所示

transform(
    df,
    Result = regmatches(Text, regexpr(paste0(vec, collapse = "|"), Text))
)

str_extract

df %>%
    mutate(Result = str_extract(Text, paste0(vec, collapse = "|")))

这给出了

  ID                                                         Text        Result
1  1 A basket of fruits having apples, green bananas, and peaches        apples
2  2 A basket of fruits having green bananas, apples, and peaches green bananas
3  3 A basket of fruits having peaches, green bananas, and apples       peaches
© www.soinside.com 2019 - 2024. All rights reserved.