如何查找一列中的文本字符串是否在另一列中?

问题描述 投票:0回答:1

以下是样本数据

 df1 <- c ("Board of Accountancy", "Board of Economists", "Board of Medicine"
 df2 <- c ("State Board of Accountancy", "The State Board of Economists", "State Board of Law")

手头的任务有两个方面。首先,在 df2 中搜索 df1 中找到的文本字符串。如果在 df1 中没有找到它,那么就不管它并得到如下的最终结果。这与我昨天提出的一个问题有关,但经过仔细检查..我的第一项工作是查找 df1 中的名称是否在 df2 中找到。

df3: "State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine"
r dplyr fuzzy-search
1个回答
0
投票
c(df2, df1[rowSums(sapply(df1, grepl, df2)) < 1])
# [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            
df3
# [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            

演练:

  • grepl
    本身只接受一个模式,所以我们需要迭代每个模式;我们用
    sapply
  • 来做到这一点
  • 由于 (
    sapply
    ) 返回一个矩阵(每个模式与所有
    df2
    相对应),我们需要查找一行上的任何内容(每个
    df1
    )是否匹配;我们用
    rowSums(.) < 1
    (又名
    == 0
    )来做到这一点,这意味着没有任何匹配;通过对
    df1[..]
    进行子集化,我们得到
    df1
    ,其中未找到匹配项

更正数据:

df1 <- c("Board of Accountancy", "Board of Economists", "Board of Medicine")
df2 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
df3 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine")
© www.soinside.com 2019 - 2024. All rights reserved.