使用 sed 从大型文本文件编辑复杂的 id

问题描述 投票:0回答:1

我想更改这个大文本文件中的 ID,以便只有“;”之间的第二个字段留下来。见下图:

从这里开始:

AT1G01030;AT1G01030.2;atRTD2::1:11648-13714
TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
AT1G01040;AT1G01040.1;Araport::1:23120-31227
GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
AT1G01040-AT1G01046;AT1G01040-AT1G01046.1;Isoseq::1:23134-31211
AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT

对此:

AT1G01030.2
TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
AT1G01040.1
GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
AT1G01040-AT1G01046.1
AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT

你会如何使用 sed 来做到这一点?

我已经将我感兴趣的区域与

;[a-zA-Z0-9]+\.[0-9]+;

相匹配
sed
1个回答
0
投票

使用

sed

$ sed -E 's/[[:alnum:]][^;]*;([^;]*).*/\1/' input_file
> AT1G01030.2
> TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
> AT1G01040.1
> GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
> AT1G01040-AT1G01046.1
> AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT
© www.soinside.com 2019 - 2024. All rights reserved.