如何在不仅仅是 fasta 文件中线性化 fasta 序列?

问题描述 投票:0回答:1

我正在运行

ipcress
进行计算机 PCR,结果如下所示:

Ipcress result
Experiment: Primer1
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 2601 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer1 2601 B 91258 0 A 93839 0 revcomp
>F-RK1_product_1 seq QLOD02000001.1:filter(unmasked) start 91258 length 2601
AAGCGGATTGAGAAGTGGTGGTGGTAGTAGCAGTCATGTGGGTAACGAAGACTACAACAGCAGTATTATA
ATTAGGAAAAGGTTTGAAGAAAAGATGAGGCTTGAAAGGGACGACGACGACGACAAGATCTTCAATCCCA
CCAAGTACTTTGTCCAAGAAGTTGTTAATTGCTTTGATGAGTCTGACCTCTACAGAACT...

Ipcress result
Experiment: Primer2
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 854 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer2 854 B 149835 0 A 150669 0 revcomp
>F-RK3_product_1 seq QLOD02000001.1:filter(unmasked) start 149835 length 854
AGGATGACATGGGAATCTGGGACCTCAACCATTTTGTCTAGCTCTCTCCCAAGAGAAAGCGACGAAAATG
ACATGGGTTTGGCTCTGTATTGTTTAACAAATTTAAGTGGCTTAAAAACTCTAC....

我想知道是否有任何方法可以线性化这些 fasta 序列(仅此而已)?我希望我的最终文件如下所示:

Ipcress result
Experiment: Primer1
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 2601 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer1 2601 B 91258 0 A 93839 0 revcomp
>F-RK1_product_1 seq QLOD02000001.1:filter(unmasked) start 91258 length 2601
AAGCGGATTGAGAAGTGGTGGTGGTAGTAGCAGTCATGTGGGTAACGAAGACTACAACAGCAGTATTATAATTAGGAAAAGGTTTGAAGAAAAGATGAGGCTTGAAAGGGACGACGACGACGACAAGATCTTCAATCCCACCAAGTACTTTGTCCAAGAAGTTGTTAATTGCTTTGATGAGTCTGACCTCTACAGAACT...

Ipcress result
Experiment: Primer2
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 854 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer2 854 B 149835 0 A 150669 0 revcomp
>F-RK3_product_1 seq QLOD02000001.1:filter(unmasked) start 149835 length 854
AGGATGACATGGGAATCTGGGACCTCAACCATTTTGTCTAGCTCTCTCCCAAGAGAAAGCGACGAAAATGACATGGGTTTGGCTCTGTATTGTTTAACAAATTTAAGTGGCTTAAAAACTCTAC....
awk fasta
1个回答
0
投票

如果您询问如何解开以

>
(FASTA 标头)开头的行和空行之间的行,这非常简单:

awk '/^>/ { wrap=1; print; next }
   wrap && /^$/ { print wrapped; wrapped = ""; wrap = 0 }
   wrap { wrapped = wrapped $0; next }
   1
   END { if (wrap) print wrapped }' file >newfile

回想一下,Awk 一次检查一行。如果我们看到 FASTA 标头,我们将

wrap
设置为 1,这样我们就可以记住这个事实,打印当前行,然后跳到下一行。现在,在后续行中,如果我们看到空行,我们将打印我们收集的所有内容(在脚本的下一行中处理),并停止收集。否则,如果我们在脚本中进行到这里并且
wrap
为 true,则将当前行收集到
wrapped
的末尾并跳到下一个输入行。否则,前面的案例中未涵盖的任何内容都会被简单地打印出来。 (Awk 习语
1
是执行此操作的简写。)最后,如果我们完成后
wrapped
中有内容,请不要忘记也打印它。

演示:https://ideone.com/ZCkKss

© www.soinside.com 2019 - 2024. All rights reserved.