以下是样本数据
df1 <- c ("Board of Accountancy", "Board of Economists", "Board of Medicine"
df2 <- c ("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
手头的任务有两个方面。首先,在 df2 中搜索 df1 中找到的文本字符串。如果在 df1 中没有找到它,那么就不管它并得到如下的最终结果。这与我昨天提出的一个问题有关,但经过仔细检查..我的第一项工作是查找 df1 中的名称是否在 df2 中找到。
df3: "State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine"
c(df2, df1[rowSums(sapply(df1, grepl, df2)) < 1])
# [1] "State Board of Accountancy" "The State Board of Economists" "State Board of Law" "Board of Medicine"
df3
# [1] "State Board of Accountancy" "The State Board of Economists" "State Board of Law" "Board of Medicine"
演练:
grepl
本身只接受一个模式,所以我们需要迭代每个模式;我们用 sapply
sapply
) 返回一个矩阵(每个模式与所有 df2
相对应),我们需要查找一行上的任何内容(每个 df1
)是否匹配;我们用 rowSums(.) < 1
(又名 == 0
)来做到这一点,这意味着没有任何匹配;通过对 df1[..]
进行子集化,我们得到 df1
,其中未找到匹配项更正数据:
df1 <- c("Board of Accountancy", "Board of Economists", "Board of Medicine")
df2 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
df3 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine")