我如何使用Biopython查找蛋白质的核苷酸序列?

问题描述 投票:1回答:1

我有一些蛋白质,我希望找到它们的相应核苷酸序列。我也有发现蛋白质的基因组。在基因组中,我找到了该蛋白质的相应基因ID。但是,我很难获得带有基因ID的核苷酸序列。我尝试使用Entrez Efetch:

Entrez.email = "[email protected]"
with open("genome.gb", "w") as out_handle:
    request = Entrez.efetch(db="gene", id="2703488", rettype="gb", retmode="text")
    out_handle.write(request.read())
    request.close()

但是这只会返回以下内容:

1. G
tail component [Escherichia virus Lambda]
Other Aliases: lambdap14
Other Designations: tail component
Annotation:  NC_001416.1 (9711..10133)
ID: 2703488

反正有没有使用Efetch获得实际核苷酸序列的方法?提前致谢!

python bioinformatics biopython ncbi rentrez
1个回答
3
投票

您可以使用Annotation:行中的信息从NCBI核苷酸获得序列:

>>> from Bio import Entrez, SeqIO
>>> Entrez.email = ''
>>> request = Entrez.efetch(db="nuccore", id="NC_001416.1", rettype="fasta", seq_start="9711", seq_stop="10133")
>>> seq_record = SeqIO.read(request, "fasta")
>>> seq_record
SeqRecord(seq=Seq('ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCT...TGA', SingleLetterAlphabet()), id='NC_001416.1:9711-10133', name='NC_001416.1:9711-10133', description='NC_001416.1:9711-10133 Enterobacteria phage lambda, complete genome', dbxrefs=[])
>>> print(seq_record.seq)
ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCTGAACTGTCAGCCCTGCAGCGCATTGAGCATCTCGCCCTGATGAAACGGCAGGCAGAACAGGCGGAGTCAGACAGCAACCGGAAGTTTACTGTGGAAGACGCCATCAGAACCGGCGCGTTTCTGGTGGCGATGTCCCTGTGGCATAACCATCCGCAGAAGACGCAGATGCCGTCCATGAATGAAGCCGTTAAACAGATTGAGCAGGAAGTGCTTACCACCTGGCCCACGGAGGCAATTTCTCATGCTGAAAACGTGGTGTACCGGCTGTCTGGTATGTATGAGTTTGTGGTGAATAATGCCCCTGAACAGACAGAGGACGCCGGGCCCGCAGAGCCTGTTTCTGCGGGAAAGTGTTCGACGGTGAGCTGA
© www.soinside.com 2019 - 2024. All rights reserved.