如何修剪或填充序列以达到一定长度bio python

问题描述 投票:0回答:1

最简单的方法是修剪或填充一组biopython fastfa文件,直到它们都具有一定长度,以便可以将它们添加到多序列比对中? BioPython AlignIO ValueError says strings must be same length?类似于此处的答案,但有多个序列,没有文本文件,最后应将其全部合并到一个多序列的声明中。最终目标是使所有序列均为570个字符。我打算将所有这些都整合到一棵门上。

python bioinformatics biopython
1个回答
0
投票

[我对Biopython不熟悉,但是我知道您可以在pysam中轻松完成操作,方法是读取FASTA,循环遍历每个序列,将序列修整为特定大小,然后将其写入新的FASTA。请参见下面的示例:

from pysam import FastxFile


fasta_q_file = "INPUT.fasta"
out_filename = "OUTPUT_NAME.fasta"
size_size_trim = 50


with FastxFile(fasta_q_file) as fh, open(out_filename, mode='w') as fout:
    for entry in fh:
        sequence_id = entry.name
        sequence = entry.sequence

        if sequence > size_size_trim:
            fout.write(">{}_trimmed_to_{}_bp\n{}\n".format(size_size_trim,sequence_id, sequence[:size_size_trim]))

        else:
            if sequence == size_size_trim:
                fout.write(">{}\n{}\n".format(sequence_id, sequence[:size_size_trim]))
            else:
                # sequences shorter than `size_size_trim` are not written.
                continue
© www.soinside.com 2019 - 2024. All rights reserved.