如何识别数据集之间的匹配字符串？

Question

我一直在尝试使用其他类似问题的答案，但是没有运气。我有2个数据集：

#df1:
Gene
ACE
BRCA
HER2

#df2:
Gene       interactors
GP5       ACE, NOS, C456
TP53      NOS, BRCA, NOTCH4

我希望在我的第一个数据集中添加一列，以识别在我的第二个数据集中显示为相互作用子的基因。

输出：

#df1:
Gene   Matches
ACE      TRUE
BRCA     TRUE
HER2     FALSE

当前我正在尝试df1$Matches <- mapply(grepl, df1$Gene, df2$interactors)这样就可以运行，但是当我增加df1中的基因数量时，匹配数下降了，这没有意义，因为我没有删除任何最初运行的基因，这让我觉得这没有像我期望的那样。 >

我也尝试过：

library(stringr)
df1 %>% 
+     rowwise() %>% 
+     mutate(exists_in_title = str_detect(Gene, df2$interactors))
Error: Column `exists_in_title` must be length 1 (the group size), not 3654
In addition: There were 50 or more warnings (use warnings() to see the first 50)
我也尝试了dplyr版本的此错误，]。
我还有什么其他方法可以解决这个问题？任何帮助，将不胜感激。

输入数据：

dput(df1)
structure(list(Gene = c("ACE", "BRCA", "HER2")), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))

dput(df2)
structure(list(Gene = c("GP5", "TP53"), interactors = c("ACE, NOS, C456", 
"NOS, BRCA, NOTCH4")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))
我一直在尝试使用其他类似问题的答案，但是没有运气。我有2个数据集：＃df1：基因ACE BRCA HER2＃df2：基因相互作用物GP5 ACE，NOS，C456 TP53 NOS，BRCA，...

Answer 1

这里是结合tidyr和Base R的答案。首先，我们读取数据：

text1 <- "Gene
ACE
BRCA
HER2"
text2 <- "Gene|interactors
GP5|ACE, NOS, C456
TP53|NOS, BRCA, NOTCH4"

df1 <- read.csv(text = text1,header = TRUE,stringsAsFactors = FALSE)
df2 <- read.csv(text = text2,header = TRUE,stringsAsFactors = FALSE,sep = "|")

如何识别数据集之间的匹配字符串？

问题描述投票：1回答：1

1个回答

最新问题

如何识别数据集之间的匹配字符串？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1