我正在尝试使用 for 循环将 fasta 文档转换为 python 中的字典,但只捕获了我的最后一次迭代

问题描述 投票:0回答:2

我正在尝试编写代码来创建一个字典,该字典读取 fasta 文档中的 dna 序列,其中 dna 序列的名称在包含名称的行的开头用“>”表示。在下次遇到名称之前,DNA 序列的碱基将继续分配给字典条目。我创建的 for 循环只为最后一个序列创建一个字典,我不明白为什么会这样。

这是我写的代码:

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        for line in fasta:
            rows = line.split()
            sequencedict = {}
            sequence = ''
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
            else:
                sequence += str(rows)[2:-2]
            sequencedict[sequencename] = sequence
    return(sequencedict)
print(read_fasta())

我假设我的缩进有错误,但我不知道在哪里。

编辑:我已经解决了错误。我将“sequencedict = {}”这一行移到了 for 循环之外。我的新密码是:

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        sequencedict = {}
        for line in fasta:
            rows = line.split()
            sequence = ''
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
            else:
                sequence += str(rows)[2:-2]
            sequencedict[sequencename] = sequence
    return(sequencedict)
print(read_fasta())
python for-loop bioinformatics fasta
2个回答
1
投票

有两件事你需要看。首先,您不想(重新)在循环的每次迭代中创建容器,因为这会撤消之前的工作。其次,您可能希望将当前项目添加到键标识列表中,而不是仅仅设置键的值,因为那样只会捕获任何给定键的最后一次迭代。

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:

        sequencedict = {}  # move creation of the container out of the for loop

        for line in fasta:
            rows = line.split()
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
                sequence = ''
            else:
                sequence += str(rows)[2:-2]


            sequencedict.setdefault(sequencename, []).append(sequence)  ## append to the right key
    return(sequencedict)
print(read_fasta())

如果您想连接字符串而不是附加列表,请尝试:

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        sequencedict = {}  # move creation of the container out of the for loop
        for line in fasta:
            rows = line.split()
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
                sequence = ''
            else:
                sequence += str(rows)[2:-2]

            sequencedict[sequencename] = sequencedict.get(sequencename, "") + sequence  ## append to the right key

    return(sequencedict)
print(read_fasta())

1
投票

您需要在

dict
循环外声明您的
for
。就目前而言,您的
dict
在每次新迭代时都会更新。

© www.soinside.com 2019 - 2024. All rights reserved.