如何使用 R 查找集合之间重复的内容?

问题描述 投票:0回答:1

假设我有三个数据集,每个数据集都是差异表达基因的列表。如何使用 R 找到在所有三组中重复的基因?

数据集的一个示例是: (每组有数百个基因) 数据集1:
克拉斯 MAPK1 CYCS A B C D ABCG1 TMEM51

数据集2: CYCS 量规12J TMEM51 ABCG1 MAPK1

数据集3: 克拉斯 ABCG1 TMEM51 白蛋白 RGS13 CYCS

此示例得到的输出将是 ABCG1、CYCS 和 TMEM51,因为这些是唯一出现在所有三个步骤中的输出。

我尝试使用 dplyr 包, `

# Function to extract gene symbols from CSV file
extract_genes <- function(file_path) {
df <- read.csv(file_path, header = TRUE)  # Read CSV file
genes <- df$GeneSymbol  # Extract gene symbols column
return(genes)
}

# File paths for your datasets
file_paths <- c(" Significance 1.csv", 
            "Significance 2.csv", 
            "Significance 3.csv", 
            "Significance 4.csv")

# List to store gene symbols from each dataset
gene_lists <- list()

# Extract gene symbols from each dataset
for (file_path in file_paths) {
gene_lists[[file_path]] <- extract_genes(file_path)
}

# Find common genes across all datasets
common_genes <- Reduce(intersect, gene_lists)

# Print common genes
print(common_genes)`

我收到这样的回复:NULL

但是,我知道所有数据集中都存在基因,所以这个结果一定是错误的。

r bioinformatics genomics
1个回答
0
投票

您可以在此处使用两次

intersect
迭代:

d1 <- c("KRAS", "MAPK1", "CYCS", "ABCD", "ABCG1", "TMEM51")
d2 <- c("CYCS", "GAGE12J", "TMEM51", "ABCG1", "MAPK1")
d3 <-  c("KRAS", "ABCG1", "TMEM51", "ALB", "RGS13", "CYCS")

intersect(intersect(d1, d2), d3)

# [1] "CYCS"   "ABCG1"  "TMEM51"

Reduce

Reduce(intersect, list(d1,d2,d3))

# [1] "CYCS"   "ABCG1"  "TMEM51"

注意,如果这些是数据框,您只需执行以下操作:

d1 <- data.frame(gene = c("KRAS", "MAPK1", "CYCS", "ABCD", "ABCG1", "TMEM51"))
d2 <- data.frame(gene = c("CYCS", "GAGE12J", "TMEM51", "ABCG1", "MAPK1"))
d3 <-  data.frame(gene = c("KRAS", "ABCG1", "TMEM51", "ALB", "RGS13", "CYCS"))

intersect(intersect(d1$gene, d2$gene), d3$gene)
# [1] "CYCS"   "ABCG1"  "TMEM51"
 # or

Reduce(intersect, list(d1$gene, d2$gene, d3$gene))
# [1] "CYCS"   "ABCG1"  "TMEM51"
© www.soinside.com 2019 - 2024. All rights reserved.