假设我有三个数据集,每个数据集都是差异表达基因的列表。如何使用 R 找到在所有三组中重复的基因?
数据集的一个示例是:
(每组有数百个基因)
数据集1:
克拉斯
MAPK1
CYCS
A B C D
ABCG1
TMEM51
数据集2: CYCS 量规12J TMEM51 ABCG1 MAPK1
数据集3: 克拉斯 ABCG1 TMEM51 白蛋白 RGS13 CYCS
此示例得到的输出将是 ABCG1、CYCS 和 TMEM51,因为这些是唯一出现在所有三个步骤中的输出。
我尝试使用 dplyr 包, `
# Function to extract gene symbols from CSV file
extract_genes <- function(file_path) {
df <- read.csv(file_path, header = TRUE) # Read CSV file
genes <- df$GeneSymbol # Extract gene symbols column
return(genes)
}
# File paths for your datasets
file_paths <- c(" Significance 1.csv",
"Significance 2.csv",
"Significance 3.csv",
"Significance 4.csv")
# List to store gene symbols from each dataset
gene_lists <- list()
# Extract gene symbols from each dataset
for (file_path in file_paths) {
gene_lists[[file_path]] <- extract_genes(file_path)
}
# Find common genes across all datasets
common_genes <- Reduce(intersect, gene_lists)
# Print common genes
print(common_genes)`
我收到这样的回复:NULL
但是,我知道所有数据集中都存在基因,所以这个结果一定是错误的。
您可以在此处使用两次
intersect
迭代:
d1 <- c("KRAS", "MAPK1", "CYCS", "ABCD", "ABCG1", "TMEM51")
d2 <- c("CYCS", "GAGE12J", "TMEM51", "ABCG1", "MAPK1")
d3 <- c("KRAS", "ABCG1", "TMEM51", "ALB", "RGS13", "CYCS")
intersect(intersect(d1, d2), d3)
# [1] "CYCS" "ABCG1" "TMEM51"
或
Reduce
:
Reduce(intersect, list(d1,d2,d3))
# [1] "CYCS" "ABCG1" "TMEM51"
注意,如果这些是数据框,您只需执行以下操作:
d1 <- data.frame(gene = c("KRAS", "MAPK1", "CYCS", "ABCD", "ABCG1", "TMEM51"))
d2 <- data.frame(gene = c("CYCS", "GAGE12J", "TMEM51", "ABCG1", "MAPK1"))
d3 <- data.frame(gene = c("KRAS", "ABCG1", "TMEM51", "ALB", "RGS13", "CYCS"))
intersect(intersect(d1$gene, d2$gene), d3$gene)
# [1] "CYCS" "ABCG1" "TMEM51"
# or
Reduce(intersect, list(d1$gene, d2$gene, d3$gene))
# [1] "CYCS" "ABCG1" "TMEM51"