编辑FASTA:如何使用正则表达式和列表更改FASTA的标题,以及替换文件

问题描述 投票:0回答:1

我有一个看起来像这样的FASTA:

NZ_UARI01000011.1阪崎肠杆菌菌株NCTC11467,全基因组shot弹枪序列GCGCATTTCTTATTACGGAGAAATACAGCAGCGTGTCTGTTTCAATTTTCAGCTTGTTCCGGATTGTTAAAGAGCAAATACTT ...NZ_UARI01000001.1阪崎肠杆菌菌株NCTC11467,全基因组shot弹枪序列CAATTTTACTTGTTGATATAACAATCACGCTAACTATTCAGCCAATAGCTCCCGCATTAAAACCAGCTACTTCAGCCAAA...

而且我想将标题更改为此:

'>阪崎克罗诺杆菌菌株NCTC11467_1GCGCATTTCTTATTACGGAGAAATACAGCAGCGTGTCTGTTTCAATTTTCAGCTTGTTCCGGATTGTTAAAGAGCAAATACTT ...'>阪崎克罗诺杆菌菌株NCTC11467_2CAATTTTACTTGTTGATATAACAATCACGCTAACTATTCAGCCAATAGCTCCCGCATTAAAACCAGCTACTTCAGCCAAA...(等等)(忽略标题开头的')

然后,我想用标题名称保存此文件。理想情况下,我不想制作新的Fasta,而只需将文件替换为更正:Cronobacter_sakazakii_strain NCTC11467.fasta

现在这很容易单独进行,但是我有600多个文件。因此,做每个人的想法不是我想走的路。我在这里编写了一个脚本,在其中我使用正则表达式隔离想要的标头部分,并将其存储在名为new_new的列表中。然后,我想匹配这些值并替换为以'>'开头的每一行,然后再添加_1 / 2/3 / ...或#(如上所示)。您能帮我完成这项任务吗?如果我提供的脚本不值得继续,并且您有更好的解决方案,请告诉我。

#usr/bin/python import sys import os import re import csv #sys.argv[1] =fasta #sys.argv[2] = list of header names (mass) #Gather existing headers to list (new_new) with open(sys.argv[1], "r+") as text_file: lines = text_file.readlines()[1:] mylist = [] new_new = [] for i in lines: if '.' in i: mylist.append(i) pattern = r">*Cronobacter +\w* +\w* +.*[,]" regex = re.compile(pattern, re.IGNORECASE) for j in mylist: for match in regex.finditer(j): value = match.group(0) new_new.append(value) for k in lines: if '>' in k: k= k.replace('.*',new_new[value]) text_file.close() '''

python regex list loops fasta
1个回答
0
投票
import os import re from Bio.SeqIO.FastaIO import SimpleFastaParser #sys.argv[1] =fasta fastas = [] filename= sys.argv[1] newfilename = '' with open(filename, "r") as text_file: fastas = list(SimpleFastaParser(text_file)) for idx, (id, seq) in enumerate(fastas): s = re.search(r"Cronobacter +\w* +\w* +.*(?=,)", id, re.IGNORECASE) fastas[idx] = s.group(), seq newfilename = fastas[0][0] + '.fasta' with open(filename, 'w') as text_file: for idx, (id, seq) in enumerate(fastas): text_file.write(f'>{id}_{idx + 1}\n{seq}\n') os.rename(filename, newfilename)
© www.soinside.com 2019 - 2024. All rights reserved.