我有一个数据框(dim 2914 x 6),其中一列是动物组和物种缩写的向量,例如“bird_F.pw”,我有一个单独的几个物种缩写的向量,例如“F.pw”。我试图提取数据框中动物组和物种缩写与缩写“相似”的所有数据行(即,我不知道前缀)。我想使用 %in% 和 %like% 等运算符,但我无法找到生成不相同匹配的方法。 这是一个示例数据框:
df<-cbind(
c("A","B","C","D","E"),
c(1:5),
c("insect_F.vp","bird_L.ts","insect_P.qr","insect_V.cl","bird_H.dw"))
colnames(df) <- c("season","survey_id","pollinator")
这是我想在该数据框中搜索的缩写向量:
abbrevs <- c("L.ts","P.qr","H.dw")
我的预期结果是:
output <- cbind(c("B","C","E"),c(2:3,5),c("bird_L.ts","insect_P.qr","bird_H.dw"))
colnames(output) <- colnames(df)
stringr
。主要技巧是创建一个具有替代方案的复合模式(
"|"
)。这是通过 paste
完成的。美元符号与字符串的末尾匹配,因此 abbrevs
必须结束正在搜索的数据。df <- data.frame(
season = c("A","B","C","D","E"),
survey_id = 1:5,
pollinator = c("insect_F.vp","bird_L.ts","insect_P.qr","insect_V.cl","bird_H.dw")
)
abbrevs <- c("L.ts","P.qr","H.dw")
pat <- paste0(abbrevs, "$", collapse = "|")
# base R
i <- grepl(pat, df$pollinator)
# package stringr
j <- stringr::str_detect(df$pollinator, pat)
df[i, ]
#> season survey_id pollinator
#> 2 B 2 bird_L.ts
#> 3 C 3 insect_P.qr
#> 5 E 5 bird_H.dw
df[j, ]
#> season survey_id pollinator
#> 2 B 2 bird_L.ts
#> 3 C 3 insect_P.qr
#> 5 E 5 bird_H.dw
创建于 2023-08-24,使用
apply
系列:
df <- df <- as.data.frame(df)
df[apply(sapply(abbrevs, function(x) grepl(x, df$pollinator)), 1, any), ]
season survey_id pollinator
2 B 2 bird_L.ts
3 C 3 insect_P.qr
5 E 5 bird_H.dw