我有一些 fasta 序列,如下所示。
>contig_0,length=363,cov=3.6,min=5,max=7,gc=0.413,left=F_BRANCH_1.0,right=F_BRANCH_1.0 rightEdges=(95156-1-1-2-A)
CTTCATTTTTCCCTAGTCCCTTTCCTGGTATATATCCCATCTTGGTCATGATTTTTTGACTCGTGGGGCT
A
>contig_1,length=359,cov=4.8,min=2,max=8,gc=0.482,left=DEAD_END,right=DEAD_END
CATGGTCTCAATTTTCAACCAACCATGAGACAAAGAAGCTCACGGGAAGCCATCATTCACCCAGCACAAC
ACCTACGAGTAGGAAATCGGCAATGGGCTTCGATATGTGACACCCAGGCAGACGTGCCCTCAACCTAATG
>contig_2,length=363,cov=3.6,min=5,max=7,gc=0.413,left=F_BRANCH_1.0,right=F_BRANCH_1.0 rightEdges=(95156-1-1-2-A)
TACTTTGATACACTTCTGAGGCTGTGCCTATGCCGACAAGTCCTGTAACAGCCTTTTGTTTAGGCCAATT
TTTTGGCCACTGATTTAAAGCAATCCAATTTTTTGGCCACTGATTTAAAGCAATGATAGAGACATCTGCT
CCAGTGTCTACCA
我想按以下方式编辑 fasta 序列的标题。您的建议将不胜感激。
>contig_0
CTTCATTTTTCCCTAGTCCCTTTCCTGGTATATATCCCATCTTGGTCATGATTTTTTGACTCGTGGGGCT
A
>contig_1
CATGGTCTCAATTTTCAACCAACCATGAGACAAAGAAGCTCACGGGAAGCCATCATTCACCCAGCACAAC
ACCTACGAGTAGGAAATCGGCAATGGGCTTCGATATGTGACACCCAGGCAGACGTGCCCTCAACCTAATG
>contig_2
TACTTTGATACACTTCTGAGGCTGTGCCTATGCCGACAAGTCCTGTAACAGCCTTTTGTTTAGGCCAATT
TTTTGGCCACTGATTTAAAGCAATCCAATTTTTTGGCCACTGATTTAAAGCAATGATAGAGACATCTGCT
CCAGTGTCTACCA
识别以
>
模式开头的行,并在 ,
分隔符上拆分标题字段,然后仅使用第一个字段重新填充整行。
awk '/^>/{split($0,arr,","); $0 = arr[1]}1' file.fasta