我试图找到数据框中“找到”和“住房”分组在一起的所有实例。如果这些单词出现在同一个句子中但它们之间有其他单词,我不感兴趣。
示例:
df <- tibble(x =c("I found housing, but I don't have a job yet",
"I found clothes, but not housing"))
我尝试使用以下代码来选出第一行(因为它同时具有“找到”和“住房”),但它仍然为我提供了这两行:
df %>%
filter(str_detect(x, "(?=.*found)(?=.*housing)"))
# A tibble: 2 × 1
x
<chr>
1 I found housing, but I don't have a job yet
2 I found clothes, but not housing
理想情况下,我可以选择查看这些事件(如上所示)以及计算它们在我的数据框中出现的次数。
谢谢你。
尝试
df <- data.frame(
x = c(
"I found housing, but I don't have a job yet",
"I found clothes, but not housing"
)
)
grepl(
"(?i)found housing|housing found",
df$x
) |>
sum()
# [1] 1
请参阅正则表达式https://regex101.com/r/4fviWx/latest
(?i)
使正则表达式不区分大小写 - 如果您始终有非大写字母,则可以省略此
found housing|housing found
匹配您想要的两个选项。
R代码:
grepl()
返回一个布尔向量,具体取决于您的正则表达式是否匹配
sum()
计算 TRUE
的出现次数