在同一行和打印行中Grep列值

问题描述 投票:0回答:1

我有这个 5 列文件:

m64071_220512_054244/46858502/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTA fwd pet047-9952 TATTGGGCACGGTGTC
m64071_220512_054244/52233509/ccs AGCTTTTTTGGAATCTTCTGCTAAAGAAAATCAGACTGCTGTGGATGTTTTTCGAAGGATAATTTTGGAGGCAGAAAAAATGGACGGGGCAGCTTCACAAGGCAAGTCTTCATGCTCGGTGATGTGATTCTGCTGCAAAGCCTGAGGACACTGGGAATATATTCTACCTGAAGAAGCAAACTGCCCGTTCTCCTTGAAGATAAACTATGCTTCTTTTTTCTTCTGTTAACCTGAAAGATATCATTTGGGTCAGAGCTCCCCTCCCTTCAGATTATGTTAACTCTGAGTCTGTCCAAATGAGTTCACTTCCATTTTCAAATTTTAAGCAATCATATTTTCAATTTATATATTGTATTTCTTAATATTATGACCAAGAATTTTATCGGCATTAATTTTTCAGTGTAGTTTGTTGTTTAAAATAATGTAATCATCAAAATGATGCATATTGTTACACTACTATTAACTAGGCTTCAGTATATCAGTGTTTATTTCATTGTGTTAAATGTATACTTGTAAATAAAATAGCTGCAAACCTCGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCAGATGGCGACACCGTGCCCAATAAGATCGGAAGAGCGTCGTGTAGA rev pet047-9952 GACACCGTGCCCAATA
m64071_220512_054244/91226755/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTAGAATATATTCCCAGTGTCCTCAGGCTTTGCAGCAGAATCACATCACCGAGCATGAAGACTTGCCTTGTGAAGCTGCCCCGTCCATTTTTTCTGCCTCCAA fwd pet047-9952 TATTGGGCACGGTGTC

对于每一行,我需要 grep 第二个字段 $2 中的最后一列值 $5。 然后,我需要使用额外的 $6 列打印同一行,其中 grep 结果的条件为:

if ($3 == rev)
,$6 是 grep 结果 + grep 之后的 12 个字符或
if ($3 == fwd)
grep 结果 + grep 之前的 12 个字符。

awk '$2~/$5/ {match($0, /$5/); if ($4=="rev") print substr($0, RSTART +12, RLENGTH + 12); else print substr($0, RSTART + 0, RLENGTH + 12) ;}' file

$5 值需要 16 个字符,而我寻找的模式始终是 12 个字符。然后,我的 6 美元输出是 28 个字符。

预期输出:

m64071_220512_054244/46858502/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTA fwd pet047-9952 TATTGGGCACGGTGTC TATTGGGCACGGTGTCGCCATCTGATCG
m64071_220512_054244/52233509/ccs AGCTTTTTTGGAATCTTCTGCTAAAGAAAATCAGACTGCTGTGGATGTTTTTCGAAGGATAATTTTGGAGGCAGAAAAAATGGACGGGGCAGCTTCACAAGGCAAGTCTTCATGCTCGGTGATGTGATTCTGCTGCAAAGCCTGAGGACACTGGGAATATATTCTACCTGAAGAAGCAAACTGCCCGTTCTCCTTGAAGATAAACTATGCTTCTTTTTTCTTCTGTTAACCTGAAAGATATCATTTGGGTCAGAGCTCCCCTCCCTTCAGATTATGTTAACTCTGAGTCTGTCCAAATGAGTTCACTTCCATTTTCAAATTTTAAGCAATCATATTTTCAATTTATATATTGTATTTCTTAATATTATGACCAAGAATTTTATCGGCATTAATTTTTCAGTGTAGTTTGTTGTTTAAAATAATGTAATCATCAAAATGATGCATATTGTTACACTACTATTAACTAGGCTTCAGTATATCAGTGTTTATTTCATTGTGTTAAATGTATACTTGTAAATAAAATAGCTGCAAACCTCGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCAGATGGCGACACCGTGCCCAATAAGATCGGAAGAGCGTCGTGTAGA rev pet047-9952 GACACCGTGCCCAATA CGATCAGATGGCGACACCGTGCCCAATA
m64071_220512_054244/91226755/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTAGAATATATTCCCAGTGTCCTCAGGCTTTGCAGCAGAATCACATCACCGAGCATGAAGACTTGCCTTGTGAAGCTGCCCCGTCCATTTTTTCTGCCTCCAA fwd pet047-9952 TATTGGGCACGGTGTC TATTGGGCACGGTGTCGCCATCTGATCG

但我没有得到我想要的。

bash awk grep
1个回答
0
投票

您可以使用以下脚本来实现您的结果:

awk '{
    idx = index($2, $5);
    if (idx != 0) {
        if ($3 == "rev") {
            substr_28 = substr($2, idx - 12, 28);
        } else {
            substr_28 = substr($2, idx, 28);
        }
        print $0, substr_28;
    }
}' your_file_containing_inp
© www.soinside.com 2019 - 2024. All rights reserved.