返回基于向量中相似(不相同)元素的数据框子集?

问题描述 投票:0回答:2

我有一个数据框(dim 2914 x 6),其中一列是动物组和物种缩写的向量,例如“bird_F.pw”,我有一个单独的几个物种缩写的向量,例如“F.pw”。我试图提取数据框中动物组和物种缩写与缩写“相似”的所有数据行(即,我不知道前缀)。我想使用 %in% 和 %like% 等运算符,但我无法找到生成不相同匹配的方法。 这是一个示例数据框:

df<-cbind( c("A","B","C","D","E"), c(1:5), c("insect_F.vp","bird_L.ts","insect_P.qr","insect_V.cl","bird_H.dw")) colnames(df) <- c("season","survey_id","pollinator")

这是我想在该数据框中搜索的缩写向量:

abbrevs <- c("L.ts","P.qr","H.dw")

我的预期结果是:

output <- cbind(c("B","C","E"),c(2:3,5),c("bird_L.ts","insect_P.qr","bird_H.dw")) colnames(output) <- colnames(df)


r string-matching
2个回答
0
投票
stringr

。主要技巧是创建一个具有替代方案的复合模式(

"|"
)。这是通过
paste
完成的。美元符号与字符串的末尾匹配,因此
abbrevs
必须结束正在搜索的数据。
df <- data.frame(
  season = c("A","B","C","D","E"),
  survey_id = 1:5,
  pollinator = c("insect_F.vp","bird_L.ts","insect_P.qr","insect_V.cl","bird_H.dw")
)
abbrevs <- c("L.ts","P.qr","H.dw")

pat <- paste0(abbrevs, "$", collapse = "|")

# base R
i <- grepl(pat, df$pollinator)
# package stringr
j <- stringr::str_detect(df$pollinator, pat)

df[i, ]
#>   season survey_id  pollinator
#> 2      B         2   bird_L.ts
#> 3      C         3 insect_P.qr
#> 5      E         5   bird_H.dw

df[j, ]
#>   season survey_id  pollinator
#> 2      B         2   bird_L.ts
#> 3      C         3 insect_P.qr
#> 5      E         5   bird_H.dw

创建于 2023-08-24,使用 

reprex v2.0.2


0
投票
apply

系列:

df <- df <- as.data.frame(df)

df[apply(sapply(abbrevs, function(x) grepl(x, df$pollinator)), 1, any), ]


  season survey_id  pollinator
2      B         2   bird_L.ts
3      C         3 insect_P.qr
5      E         5   bird_H.dw

© www.soinside.com 2019 - 2024. All rights reserved.