R 一对多映射

问题描述 投票:0回答:2

meth.genes
是一个字符向量,基于与
meth
数据框的行名相对应的基因名称。然而,它是一对多匹配,一个基因可以映射到
meth
中的多个行名。 经过一系列下游分析,我将这些基因名称与另一组 ID(合奏 ID)一对一匹配,我现在想将这些合奏 ID 匹配回
meth
的行名,我在这里想要添加/粘贴增量值(例如,“.1”,“.2”)到行名,如果它是重复的。

meth.genes <- genes.mapped$nearestGeneSymbol
bm.meth <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"),
                filters = "hgnc_symbol", mart = ensembl, values=meth.genes)
idx.meth <- meth.genes %>% match(table = bm.meth$hgnc_symbol)
meth.ensembl <- bm.meth$ensembl_gene_id[bm.meth$hgnc_symbol %in% meth.genes]
rownames(meth) <- meth.ensembl

回溯:

Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

类似的东西:

for i in length(duplicated(meth.ensembl)) {
    paste0(rownames(meth), ".", i)
}

资料:

> dput(meth[1:3,1:3])
structure(list(TCGA.2K.A9WE.01A = c(0.461440642939772, 0.143910373119058, 
0.847164847154162), TCGA.2Z.A9J1.01A = c(0.595894468074615, 0.0807243779293262, 
0.867305510246114), TCGA.2Z.A9J3.01A = c(0.553849599144766, 0.0642332527783939, 
0.917290578229414)), row.names = c("cg00000029", "cg00000165", 
"cg00000236"), class = "data.frame")

genes.mapped

> dput(genes.mapped[1:3,])
structure(list(queryHits = 1:3, subjectHits = c(17721L, 11282L, 
20626L), distance = c(237L, 11879L, 0L), nearestGeneSymbol = c("RBL2", 
"BARHL2", "VDAC3")), row.names = c("cg00000029", "cg00000165", 
"cg00000236"), class = "data.frame")

idx.meth

> dput(idx.meth[1:3])
c(3185L, 361L, 4196L)

meth.genes

> dput(meth.genes[1:3])
c("RBL2", "BARHL2", "BARHL2")   

 

dput(meth.ensembl[1:3])

c("ENSG00000175899", "ENSG00000184389", "ENSG00000184389") 

预期产出:

c("ENSG00000175899", "ENSG00000184389.1", "ENSG00000184389.2") 
r bioinformatics bioconductor biomart
2个回答
1
投票

我们可以为此使用

ave

ave(meth.ensemble, meth.genes,
    FUN = function(z) if (length(z) == 1) z else paste(z, seq_along(z), sep = "."))
# [1] "ENSG00000175899"   "ENSG00000184389.1" "ENSG00000184389.2"

数据

meth.genes <- c("RBL2", "BARHL2", "BARHL2")   
meth.ensemble <- c("ENSG00000175899", "ENSG00000184389", "ENSG00000184389") 

0
投票

您可以使用

rowid()
的粘贴
meth.genes
,在分组框架内:

library(data.table)

f <- \(e) if(length(e)>1) paste0(e,".", rowid(e)) else e

data.table(meth.genes, meth.ensemble)[,meth.ensemble:=f(meth.ensemble),meth.genes][]

输出:

   meth.genes     meth.ensemble
1:       RBL2   ENSG00000175899
2:     BARHL2 ENSG00000184389.1
3:     BARHL2 ENSG00000184389.2

然后可以将其加入

meth

© www.soinside.com 2019 - 2024. All rights reserved.