Python：从Fasta文件读取和打印序列

Question

我正在尝试仅在输入文件中打印序列ID，而不是整个描述行，以及来自fasta文件的GC内容旁边的GC内容，例如：

Seq1   40%
Seq2   37%
Seq3   12%

当我运行此代码时，什么都没有发生。

def main():
    calcGC()

def calcGC():
    fileReader = open("Sequences.fasta",'r')
    for line in fileReader:
        seqID = line.startswith (">")
        seq = line[0:]

    gc_count = float((seq.count("G") + seq.count("C"))) / len(seq)*100
    print(seqID+"   "+ gc_count)

    fileReader.close

main()

Answer 1

这将打印某些内容，而不是所需的输出。您的代码中有几个错误：

缩进是错误的。

SeqID是布尔值，仅检查行是否以<字符开头。因此，我想您想测试一下是否应该打印该行。无论如何，如果要打印的是行号（以<开头），则将是这样：

def calcGC():
    fileReader = open("Sequences.fasta", 'r')
    for seqID, line in enumerate(fileReader):
        if line.startswith(">"):
            seq = line[0:]

            gc_count = float((seq.count("G") + seq.count("C"))) / len(seq) * 100
            print(seqID, "   ", gc_count)

    fileReader.close()

if __name__ == "__main__":
    calcGC()

Answer 2

我在这里看到的第一件事是范围问题。您在小范围内调用line，但仅在calcGC函数范围的循环中才能访问line。在python中，作用域由标识定义。

现在我在这里看到的第二件事，我不理解的是.startswith()方法的使用。它将返回一个Bollean，而不是seqID ...可能在此处添加if语句？

[另一件事是：您应该使用with语句打开文件，它将为您关闭文件并为您提供一个生成器。顺便说一句，这里不需要行seq = line[0:]，您可以直接在循环中使用“ seq”，如下所示：

def main():
    calcGC()

def calcGC():
    with open("Sequences.fasta", 'r') as fp :
        # using the enumerate here will give you both the index and the line itself. I assume here that the seqID you wanted to use is the Line index....
        for seqID, seq in enumerate(fp):
            if seq.startswith(">"):
                gc_count = float((seq.count("G") + seq.count("C"))) / len(seq) * 100
                print("{} {}".format(seqID,gc_count))


main()

Python：从Fasta文件读取和打印序列

问题描述投票：1回答：2

2个回答

最新问题

Python：从Fasta文件读取和打印序列

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2