我想用 df2 替换 df1 数据,df2 是像 df1 这样的数据 例子
df1 <- data.frame(
name = c(
"A. MAHJUM-61365",
"A. MAHJUM-61365. MAHJUM-61365",
"A. RIZAL. AD-11002795",
"A. RIZAL. AD-11002795. RIZAL. AD-11002795",
"ABD. KADIR-60447",
"ABD. KADIR-60447ABD. KADIR-60447",
"ABD. KAHAR-62551",
"ABD. RASYID DS-11002082",
"ABDREAS APUNG @SANY",
"ABDUL AZIS @HYUNDAY",
"ABDUL AZIZ @HYUNDAI",
"ABDUL AZIZ@HYUNDAI"
)
df2 是
df2 <- data.frame(
name = c(
"A. MAHJUM-61365",
"A. RIZAL. AD-11002795",
"ABD. KADIR-60447",
"ABD. KAHAR-62551",
"ABD. RASYID DS-11002082",
"ABDREAS APUNG @SANY",
"ABDUL AZIS @HYUNDAY"
)
如果 df1 看起来像 df2,df1 将替换为 df2
因为是子串匹配,我们可以使用
fuzzyjoin
library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = 'name') %>%
transmute(name = coalesce(name.y, name.x))
或使用基于距离的方法
stringdist_left_join(df1, df2, by = 'name') %>%
transmute(name = coalesce(name.y, name.x))