如何检查给定文件是否为 FASTA?

问题描述 投票:0回答:2

我正在设计一个代码,需要在早期阶段输入 .fasta 文件。现在,我正在使用此函数验证输入:

def file_validation(fasta):
    while True:
        try:
            file_name= str(raw_input(fasta))
        except IOError:
            print("Please give the name of the fasta file that exists in the folder!")
            continue

        if not(file_name.endswith(".fasta")):
            print("Please give the name of the file with the .fasta extension!")
        else:
            break
    return file_name

现在,虽然此函数工作正常,但仍然存在一些错误空间,因为用户可能会输入一个文件,该文件虽然文件名以 .fasta 结尾,但内部可能包含一些非 .fasta 内容。我可以做什么来防止这种情况并让用户知道他/她的 .fasta 文件已损坏?

python user-input
2个回答
6
投票

为什么不像 FASTA 一样解析文件并查看它是否损坏?

使用

biopython
,它会因在非 FASTA 文件上返回空生成器而失败:

from Bio import SeqIO

my_file = "example.csv"  # Obviously not FASTA

def is_fasta(filename):
    with open(filename, "r") as handle:
        fasta = SeqIO.parse(handle, "fasta")
        return any(fasta)  # False when `fasta` is empty, i.e. wasn't a FASTA file

is_fasta(my_file)
# False

0
投票

Biopython SeqIO 即使使用空文件作为输入也会生成

Bio.SeqIO.FastaIO.FastaIterator

可能不是最优雅的解决方案。遍历每一行并检查“>”字符:

# Validate fasta file
fastafile = open("sequences.fasta", "r")
for line in fastafile:
    texit = False
    if not line.startswith('>'):
        texit = True
    try:
        line = next(fastafile)
    except:
        texit = True
    if line.startswith('>'):
        texit = True
    if texit:
        print("The file provided does not appear to be a proper FASTA file!")
        exit()
fastafile.close()
© www.soinside.com 2019 - 2024. All rights reserved.