我正在尝试编写代码来创建一个字典,该字典读取 fasta 文档中的 dna 序列,其中 dna 序列的名称在包含名称的行的开头用“>”表示。在下次遇到名称之前,DNA 序列的碱基将继续分配给字典条目。我创建的 for 循环只为最后一个序列创建一个字典,我不明白为什么会这样。
这是我写的代码:
def read_fasta():
with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
for line in fasta:
rows = line.split()
sequencedict = {}
sequence = ''
if str(rows)[2] == '>':
sequencename = str(rows)[3:-2]
else:
sequence += str(rows)[2:-2]
sequencedict[sequencename] = sequence
return(sequencedict)
print(read_fasta())
我假设我的缩进有错误,但我不知道在哪里。
编辑:我已经解决了错误。我将“sequencedict = {}”这一行移到了 for 循环之外。我的新密码是:
def read_fasta():
with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
sequencedict = {}
for line in fasta:
rows = line.split()
sequence = ''
if str(rows)[2] == '>':
sequencename = str(rows)[3:-2]
else:
sequence += str(rows)[2:-2]
sequencedict[sequencename] = sequence
return(sequencedict)
print(read_fasta())
有两件事你需要看。首先,您不想(重新)在循环的每次迭代中创建容器,因为这会撤消之前的工作。其次,您可能希望将当前项目添加到键标识列表中,而不是仅仅设置键的值,因为那样只会捕获任何给定键的最后一次迭代。
def read_fasta():
with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
sequencedict = {} # move creation of the container out of the for loop
for line in fasta:
rows = line.split()
if str(rows)[2] == '>':
sequencename = str(rows)[3:-2]
sequence = ''
else:
sequence += str(rows)[2:-2]
sequencedict.setdefault(sequencename, []).append(sequence) ## append to the right key
return(sequencedict)
print(read_fasta())
如果您想连接字符串而不是附加列表,请尝试:
def read_fasta():
with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
sequencedict = {} # move creation of the container out of the for loop
for line in fasta:
rows = line.split()
if str(rows)[2] == '>':
sequencename = str(rows)[3:-2]
sequence = ''
else:
sequence += str(rows)[2:-2]
sequencedict[sequencename] = sequencedict.get(sequencename, "") + sequence ## append to the right key
return(sequencedict)
print(read_fasta())
您需要在
dict
循环外声明您的for
。就目前而言,您的dict
在每次新迭代时都会更新。