如何从其他文件写入fastq文件

问题描述 投票:1回答:1

我被要求从两个文件中读取(左右读取)Aip02.R1.fastq和Aip02.R2.fastq,并使用zip函数获取交错的fasta文件。左右文件是fastq文件,但是当我将它们压缩在一起以创建新的fastq文件时,writer函数不再起作用。它给我错误“ SeqRecord(id =)具有无效的序列。”

  #!/usr/bin/env python3
  # Import Seq, SeqRecord, and SeqIO
  from Bio import SeqIO
  from Bio.SeqRecord import SeqRecord
  from Bio.Seq import Seq
  leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fastq")
  rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fastq")
  A= zip(leftReads,rightReads)
  SeqIO.write(SeqRecord(list(A)), "interleave.fastq", "fastq")
python-3.x sequence bioinformatics biopython
1个回答
1
投票

您的前进和后退序列可能具有相同的ID。因此,请使用以下代码为ID添加后缀。我在这里使用了/1/2,但也使用了.f.r之类的东西。

from Bio import SeqIO
import itertools

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1, iter2):
        assert forward.id == reverse.id
        forward.id += "/1"
        reverse.id += "/2"
        yield forward
        yield reverse

leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fastq")
rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fastq")

handle = open("interleave.fastq", "w")
count = SeqIO.write(interleave(leftReads, rightReads), handle, "fastq")
handle.close()
print("{} records written to interleave.fastq".format(count))

对于大型fastq文件,此代码可能会变慢。例如,请参见here,他们报告创建一个2GB的fastq文件需要14分钟。因此他们给出了改进的方法:

from Bio.SeqIO.QualityIO import FastqGeneralIterator
import itertools

file_f = "/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq"
file_r = "/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq"

handle = open("interleave.fastq", "w")
count = 0

f_iter = FastqGeneralIterator(open(file_f,"rU"))
r_iter = FastqGeneralIterator(open(file_r,"rU"))
for (f_id, f_seq, f_q), (r_id, r_seq, r_q) in itertools.izip(f_iter,r_iter):
    assert f_id == r_id
    count += 2
    #Write out both reads with "/1" and "/2" suffix on ID
    handle.write("@%s/1n%sn+n%sn@%s/2n%sn+n%sn" 
                 % (f_id, f_seq, f_q, r_id, r_seq, r_q))
handle.close()
print("{} records written to interleave.fastq".format(count)
© www.soinside.com 2019 - 2024. All rights reserved.