我正在尝试使用 for 循环将 fasta 文档转换为 python 中的字典，但只捕获了我的最后一次迭代

Question

我正在尝试编写代码来创建一个字典，该字典读取 fasta 文档中的 dna 序列，其中 dna 序列的名称在包含名称的行的开头用“>”表示。在下次遇到名称之前，DNA 序列的碱基将继续分配给字典条目。我创建的 for 循环只为最后一个序列创建一个字典，我不明白为什么会这样。

这是我写的代码：

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        for line in fasta:
            rows = line.split()
            sequencedict = {}
            sequence = ''
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
            else:
                sequence += str(rows)[2:-2]
            sequencedict[sequencename] = sequence
    return(sequencedict)
print(read_fasta())

我假设我的缩进有错误，但我不知道在哪里。

编辑：我已经解决了错误。我将“sequencedict = {}”这一行移到了 for 循环之外。我的新密码是：

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        sequencedict = {}
        for line in fasta:
            rows = line.split()
            sequence = ''
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
            else:
                sequence += str(rows)[2:-2]
            sequencedict[sequencename] = sequence
    return(sequencedict)
print(read_fasta())

Answer 1

有两件事你需要看。首先，您不想（重新）在循环的每次迭代中创建容器，因为这会撤消之前的工作。其次，您可能希望将当前项目添加到键标识列表中，而不是仅仅设置键的值，因为那样只会捕获任何给定键的最后一次迭代。

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:

        sequencedict = {}  # move creation of the container out of the for loop

        for line in fasta:
            rows = line.split()
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
                sequence = ''
            else:
                sequence += str(rows)[2:-2]


            sequencedict.setdefault(sequencename, []).append(sequence)  ## append to the right key
    return(sequencedict)
print(read_fasta())

如果您想连接字符串而不是附加列表，请尝试：

def read_fasta():
    with open('../data/problem_1_question_4_new.fasta', 'r') as fasta:
        sequencedict = {}  # move creation of the container out of the for loop
        for line in fasta:
            rows = line.split()
            if str(rows)[2] == '>':
                sequencename = str(rows)[3:-2]
                sequence = ''
            else:
                sequence += str(rows)[2:-2]

            sequencedict[sequencename] = sequencedict.get(sequencename, "") + sequence  ## append to the right key

    return(sequencedict)
print(read_fasta())

Answer 2

您需要在

dict

循环外声明您的

for

。就目前而言，您的

dict

在每次新迭代时都会更新。

我正在尝试使用 for 循环将 fasta 文档转换为 python 中的字典，但只捕获了我的最后一次迭代

问题描述投票：0回答：2

2个回答

最新问题

我正在尝试使用 for 循环将 fasta 文档转换为 python 中的字典，但只捕获了我的最后一次迭代

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2