我试图使用生物导体的getBM()函数来检索某个基因的序列,这是我的脚本。
setwd(".")
options(stringsAsFactors = FALSE)
cat("\014")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
listOfBiocPackages <- c("annotate", "GEOquery", "biomaRt")
bioCpackagesNotInstalled <- which( !listOfBiocPackages %in% rownames(installed.packages()) )
cat("package missing listOfBiocPackages[", bioCpackagesNotInstalled, "]: ", listOfBiocPackages[bioCpackagesNotInstalled], "\n", sep="")
# check there's still something left to install
if( length(bioCpackagesNotInstalled) ) {
BiocManager::install(listOfBiocPackages[bioCpackagesNotInstalled])
}
library("easypackages")
libraries(listOfBiocPackages)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
thisAnnotLookup <- getBM(mart=mart, attributes=c("affy_hg_u133a", "ensembl_gene_id", "gene_biotype", "external_gene_name", "gene_flank"), filter=c("affy_hg_u133a", "upstream_flank"), values=list(affyid=c("203423_at", "204088_at", "204511_at", "204911_s_at", "205234_at")), uniqueRows=TRUE, checkFilters=FALSE)
print(thisAnnotLookup)
这个脚本给我的错误是:
'names' attribute [2] must be the same length as the vector [1]
我无法理解.
有谁能帮我解决这个问题吗?谢谢!
如果你有两个过滤器,你需要提供一个2个元素的列表。但更大的问题是,upstream_flank可能无法使用,这取决于你的连接,请见 这个问题出现在许多用户身上. 对我来说,我上次用的是亚洲镜,还能用。
library(biomaRt)
mart <- useEnsembl(biomart = "ensembl",
dataset = "hsapiens_gene_ensembl",
mirror = "asia")
然后..:
S = getSequence(id=c("203423_at","204088_at","204511_at","204911_s_at","205234_at"),
type="affy_hg_u133a",seqType="gene_flank",upstream = 20,mart = mart)
thisAnnotLookup <- getBM(mart=mart,
attributes=c("affy_hg_u133a", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
filter="affy_hg_u133a",
values=c("203423_at","204088_at","204511_at","204911_s_at","205234_at"),
uniqueRows=TRUE, checkFilters=FALSE)
merge(thisAnnotLookup,S)
affy_hg_u133a ensembl_gene_id gene_biotype external_gene_name
1 203423_at ENSG00000114115 protein_coding RBP1
2 204088_at ENSG00000135124 protein_coding P2RX4
3 204511_at ENSG00000006607 protein_coding FARP2
4 204911_s_at ENSG00000110171 protein_coding TRIM3
5 205234_at ENSG00000168679 protein_coding SLC16A4
gene_flank
1 CTCCTCTTCCTTTGTAGGGG
2 GATTCCTCTCACCTCGGCCT
3 CGCGGGGGCTGCCGGCGGGC
4 CACCTTTCTACCCCTTAACT
5 GCCTGAGATGACATCAAATC
对我来说,似乎超级挑剔。我觉得你最好用直接从基因组上获取序列。