使用 stringr 查找数据框中的连续字符串

问题描述 投票:0回答:1

我试图找到数据框中“找到”和“住房”分组在一起的所有实例。如果这些单词出现在同一个句子中但它们之间有其他单词,我不感兴趣。

示例:

df <- tibble(x =c("I found housing, but I don't have a job yet", 
                  "I found clothes, but not housing"))

我尝试使用以下代码来选出第一行(因为它同时具有“找到”和“住房”),但它仍然为我提供了这两行:

df %>% 
  filter(str_detect(x, "(?=.*found)(?=.*housing)"))
# A tibble: 2 × 1
x                                          
<chr>                                      
1 I found housing, but I don't have a job yet
2 I found clothes, but not housing 

理想情况下,我可以选择查看这些事件(如上所示)以及计算它们在我的数据框中出现的次数。

谢谢你。

r regex apply stringr
1个回答
0
投票

尝试

df <- data.frame(
  x = c(
    "I found housing, but I don't have a job yet", 
    "I found clothes, but not housing"
  )
)

grepl(
  "(?i)found housing|housing found",
  df$x
  ) |> 
  sum()

# [1] 1

请参阅正则表达式https://regex101.com/r/4fviWx/latest

(?i)
使正则表达式不区分大小写 - 如果您始终有非大写字母,则可以省略此

found housing|housing found
匹配您想要的两个选项。


R代码:

grepl()
返回一个布尔向量,具体取决于您的正则表达式是否匹配

sum()
计算
TRUE

的出现次数
© www.soinside.com 2019 - 2024. All rights reserved.