在 Biopyton 中建立 Fastq 质量控制循环

问题描述 投票:0回答:1

我试图将其变成一个循环,因为我有很多文件,我正在使用 Biopython,但我不确定是否可能。

good_reads = (
    rec
    for rec in SeqIO.parse(rec1, "fastq")
    if max(rec.letter_annotations["phred_quality"]) >= 30
)
count = SeqIO.write(good_reads, "rec1.fastq", "fastq")

鉴于我有rec1、rec2等

bioinformatics biopython fastq
1个回答
0
投票

我的尝试:

给定输入:

fastq_1.fq

@SEQ_ID_1_1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

fastq_2.fq

@SEQ_ID_1_3
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID_2_3
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

fastq_3.fq

@SEQ_ID_1_2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID_2_2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

和代码:


from Bio import SeqIO

import Bio

print('\n-------------------------------')

print('\n Biopython Version : ', Bio.__version__)

print('\n-------------------------------')


fq_list = ['fastq_1.fq' , 'fastq_2.fq', 'fastq_3.fq'  ]

good_reads = ([(rec.format("fastq") for rec  in SeqIO.parse(seq, "fastq") if max(rec.letter_annotations["phred_quality"]) >= 30) for seq in fq_list ])


print('\n\ngood_reads : ', good_reads)

# this one empties the generators list
# for i in good_reads:
#     print(*i)      
    
with open("good_reads_2.fasta", "w+") as output_handle:
    
    for recs in good_reads:
        for rec in recs:
            output_handle.write(rec)

输出

good_reads_2.fasta

@SEQ_ID_1_1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID_1_3
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID_1_2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID_2_2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

我不是测序专家,但也许这两个链接更能说明碱基质量得分:

FASTQ Phred33 平均碱基质量得分

以正确的方式平均底噪质量分数

让我们知道它们是否有意义

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.