我正在尝试使用NcbitblastnCommandline对核苷酸序列进行蛋白质查询,然后报告命中。该程序运行没有错误。但是,在结果中,我的查询序列结果包含XXXXXXXX而不是我的输入序列。谁知道怎么解决这个问题?
我使用的代码是:
output=NcbitblastnCommandline(query=QUERY, subject=ALL_SUBJECTS, evalue=0.001, outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))
print(output)
我hsp_qseq输出看起来像这样(很多XXXXX的):DTLIGVAITDGNQQIMLFSNEGKAIRFAETDVRAMGRTAKGVRGMRVSFASSTLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPETGEVLCASANGYGKRTPVNDFPTKKRGGKGVIAIKTSERNGELVGAVSIDETKELLLISDGGTLVRTRAAEVAMTGRNAQGVRLIRLSEEETLVGVVSIXXXXXXXXXXXXXXXXXXXXSEEAVSNNEDTSEE
虽然我的查询其实是这样的:DTLIGVAITDGNQQIMLFSNEGKAIRFAETDVRAMGRTAKGVRGMRVSFASSTLSEED ADVENDDSDDNDDSADSSLVSRIVSLVVVPETGEVLCASANGYGKRTPVNDFPTKKRG GKGVIAIKTSERNGELVGAVSIDETKELLLISDGGTLVRTRAAEVAMTGRNAQGVRLI RLSEEETLVGVVSIEAVEDEEELLEGEVDTTETDSEEAVSNNEDTSEE
我弄乱了什么吗?
我的方法是在TBLASTN之后找到非同义替换:
from Bio.Blast.Applications import NcbitblastnCommandline
from io import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
import os
from Bio.SeqRecord import SeqRecord
from Bio.Seq import MutableSeq
file="subject.fasta"
QUERY="query.fasta"
for seq_record in SeqIO.parse(open(QUERY, mode='r'), 'fasta'):
query_list = seq_record.seq
output=NcbitblastnCommandline(query=QUERY, subject=file, max_target_seqs=1, evalue=0.001,outfmt=5)()[0]
blast_result_record = NCBIXML.parse(StringIO(output))
count = 0
for blast_record in blast_result_record:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if count > 0: continue
count+=1
mutation_list = []
for index in range(len(hsp.match)):
match_aa = hsp.match[index]
str_match = MutableSeq(hsp.query[0:index])
number_gap = str_match.count("-")
index_in_query = index
match_query = hsp.query[index]
if number_gap > 0:
index_in_query = len(str_match) - number_gap
if match_aa == " " or match_aa == "+":
if match_query == "-":
mutation = str(index_in_query + 1) + query_list[index_in_query] + 'ins' + hsp.sbjct[index]
else:
mutation = str(index_in_query + 1) + query_list[index_in_query] + '>' + hsp.sbjct[index]
mutation_list.append(mutation)
print(mutation_list)