利用biopython NcbitblastnCommandline提取非同义替换

问题描述 投票:0回答:1

我正在尝试使用NcbitblastnCommandline对核苷酸序列进行蛋白质查询,然后报告命中。该程序运行没有错误。但是,在结果中,我的查询序列结果包含XXXXXXXX而不是我的输入序列。谁知道怎么解决这个问题?

我使用的代码是:

output=NcbitblastnCommandline(query=QUERY, subject=ALL_SUBJECTS, evalue=0.001, outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))
print(output)

我hsp_qseq输出看起来像这样(很多XXXXX的):DTLIGVAITDGNQQIMLFSNEGKAIRFAETDVRAMGRTAKGVRGMRVSFASSTLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPETGEVLCASANGYGKRTPVNDFPTKKRGGKGVIAIKTSERNGELVGAVSIDETKELLLISDGGTLVRTRAAEVAMTGRNAQGVRLIRLSEEETLVGVVSIXXXXXXXXXXXXXXXXXXXXSEEAVSNNEDTSEE

虽然我的查询其实是这样的:DTLIGVAITDGNQQIMLFSNEGKAIRFAETDVRAMGRTAKGVRGMRVSFASSTLSEED ADVENDDSDDNDDSADSSLVSRIVSLVVVPETGEVLCASANGYGKRTPVNDFPTKKRG GKGVIAIKTSERNGELVGAVSIDETKELLLISDGGTLVRTRAAEVAMTGRNAQGVRLI RLSEEETLVGVVSIEAVEDEEELLEGEVDTTETDSEEAVSNNEDTSEE

我弄乱了什么吗?

python bioinformatics biopython blast
1个回答
0
投票

我的方法是在TBLASTN之后找到非同义替换:


from Bio.Blast.Applications import NcbitblastnCommandline
from io import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
import os
from Bio.SeqRecord import SeqRecord
from Bio.Seq import MutableSeq

file="subject.fasta"
QUERY="query.fasta"
for seq_record in SeqIO.parse(open(QUERY, mode='r'), 'fasta'):
    query_list = seq_record.seq
output=NcbitblastnCommandline(query=QUERY, subject=file, max_target_seqs=1, evalue=0.001,outfmt=5)()[0]
blast_result_record = NCBIXML.parse(StringIO(output))
count = 0
for blast_record in blast_result_record:
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            if count > 0: continue 
            count+=1
            mutation_list = []
            for index in range(len(hsp.match)):
                match_aa = hsp.match[index]
                str_match = MutableSeq(hsp.query[0:index])
                number_gap = str_match.count("-")
                index_in_query = index
                match_query = hsp.query[index]
                if number_gap > 0:
                    index_in_query = len(str_match) - number_gap
                if match_aa == " " or match_aa == "+":
                    if match_query == "-":
                        mutation = str(index_in_query + 1) + query_list[index_in_query] + 'ins' +  hsp.sbjct[index]
                    else:
                        mutation = str(index_in_query + 1) + query_list[index_in_query] + '>' +  hsp.sbjct[index]
                    mutation_list.append(mutation)
            print(mutation_list)
© www.soinside.com 2019 - 2024. All rights reserved.