我想更改这个大文本文件中的 ID,以便只有“;”之间的第二个字段留下来。见下图:
从这里开始:
AT1G01030;AT1G01030.2;atRTD2::1:11648-13714
TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
AT1G01040;AT1G01040.1;Araport::1:23120-31227
GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
AT1G01040-AT1G01046;AT1G01040-AT1G01046.1;Isoseq::1:23134-31211
AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT
对此:
AT1G01030.2
TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
AT1G01040.1
GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
AT1G01040-AT1G01046.1
AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT
你会如何使用 sed 来做到这一点?
我已经将我感兴趣的区域与
;[a-zA-Z0-9]+\.[0-9]+;
相匹配
使用
sed
$ sed -E 's/[[:alnum:]][^;]*;([^;]*).*/\1/' input_file
> AT1G01030.2
> TTATATACAAAATTGAAAAGATGCGAGTTTCAACATGGTGACAAAAGCCTAATGATGATGAACATCAAGAAACATGTCGGAAAAAAAAATCATAACCAAAAAAACGAAGAAGATCGTTTTTTCTTCCTCTCACTAGCTAGAATCTAATACCCTTAGAAAAATTACTAATGAAACAATATAAAGAGAGATTCAAAACAAGAAGATGATGAAACTTCTCATGGATTGAAATTGAGAGAAAGTGAAGACTTCCCTTTCTTAGCAAATTGATCATCATCGCCATCATCACCATCATCATTATCA
> AT1G01040.1
> GTGGAAAACAGACCAGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATTGGCGTGTTTCGTACGGTCTATCTAACCCTAATCTGTCACAAAACACTCTTCTTCTCTCACCCCTTTTTCTGGGTTTATTCAATTCTCGTGCTTTTGGTTCTGTTTTCTTCTCTGGGGATTTGGTTTTCTTGAGTGAGTTTTTCTCCTCTTTCTTATGTTCTTGATTTGATTATTATATAGAATTAT
> AT1G01040-AT1G01046.1
> AGAAGAGAGAGGAAGACGAAGAGAGAAACAGAACAGAGTAGGGATCGATAGACCGTGGAATCTCAGAATCACAAACACTTTGCAAAAGGGTTTTCAATTCCTATTTATTTACAAAGAAATCATCAATAGTAGTGGTCTCTAGGGTTTTGCTTGCTCTTCTTCGTGACCCCTTTTTACCTGCAAACAACAACTTCAAAATT