使用 while read line 和 sed 将模式替换为两个列表中的其他模式

问题描述 投票:0回答:1

我尝试在多个文件中替换某些模式和其他模式。例如我的 infile 看起来像这样:

>Genus_species_SRR13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGCTGGGCCTGCGAGACACCAGCACCCCCATCGTGGCCATCACCCTGCACAGCCTCGCCGTGCTGGTCTCCCTGCTCGGACCAGAGGTGGTTGTGGGCGGAGAAAGAACCAAGATCTTCAAACGCACTGCCCCCAGCTTTACAAAAACCACTGACCTCTCCCCAGAAGAC

我想要输出:

>Genus_species_Something_something|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGCTGGGCCTGCGAGACACCAGCACCCCCATCGTGGCCATCACCCTGCACAGCCTCGCCGTGCTGGTCTCCCTGCTCGGACCAGAGGTGGTTGTGGGCGGAGAAAGAACCAAGATCTTCAAACGCACTGCCCCCAGCTTTACAAAAACCACTGACCTCTCCCCAGAAGAC

我有两个列表文件,我的旧模式:

Genus_species_SRR13259292

和新模式:

Genus_species_Something_something

我尝试用 sed 来做到这一点。这是我的命令:

while IFS= read -r line1 && IFS= read -r line2 <&3; do
    for f in *.fasta; do
        sed -e "s/${line1}/${line2}/g" "$f" > "${f%.fasta}_NewName.fasta"
    done
done < "List_oldpattern.txt" 3<"List_newpatterns.txt"

但这不起作用,也许是因为 > 和 |划定了模式?

如果 sed 不起作用,可以使用 awk 吗?

谢谢您的建议

awk while-loop
1个回答
0
投票

由于问题已被标记为

awk
,我建议我们用单个
awk
脚本替换OP的所有当前代码...

我的样本

.fasta
文件:

$ head f?.fasta
==> f1.fasta <==
>Genus_species_SRR13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

>Genus_buckets_ABC13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

==> f2.fasta <==
>Genus_species_SRR13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

>Genus_buckets_ABC13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

我们将使用

paste
命令将 OP 的旧模式和新模式附加到一行中;我们将使用
|
作为分隔符:

$ paste -d'|' List_oldpattern.txt List_newpatterns.txt
Genus_species_SRR13259292|Genus_species_Something_something

现在是

awk
脚本:

awk '
BEGIN     { FS = OFS = "|" }                    # input/output field delimiter
FNR==NR   { map[">" $1] = ">" $2; next }        # 1st file (paste output): populate our map[] array; $1==old $2==new; then skip to next input line
FNR==1    { close(outf)                         # 2nd-nth files: 1st record; close previous output file
            outf = FILENAME                     # make copy of input FILENAME
            sub(/.fasta/,"",outf)               # strip trailing ".fasta"
            outf = outf "_NewName.fasta"        # append new suffix to our output filename
          }
$1 in map { $1 = map[$1] }                      # if 1st field (">some_string") is an index in the map[] array then replace 1st field with array contents
          { print > outf }                      # print current line to output file

' <(paste -d'|' List_oldpattern.txt List_newpatterns.txt) *.fasta

注意: 假设 OP 有多个旧/新模式对,此脚本的额外好处是仅扫描每个

*.fasta
文件一次(与 OP 当前的
while/read/for/sed
循环扫描每个
.fasta
文件
 相反) N
次 - 其中
N
是旧/新模式对的数量)

这会生成:

$ head *_NewName.fasta
==> f1.fasta <==
>Genus_species_SRR13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

>Genus_buckets_ABC13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

==> f2.fasta <==
>Genus_species_SRR13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....

>Genus_buckets_ABC13259292|ENSG00000000457_ENST00000367772
TACGCCGCGCACTTCACGCGAGAGCAGCTGCGCACTATCGTCCTGCCCCAGGTGCTGC....
© www.soinside.com 2019 - 2024. All rights reserved.