根据条件反转特定字段中的字符串

问题描述 投票:0回答:1

我有这个文件:

m64071_220512_054244/12584899/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/128321000/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/132186699/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA

仅当

tr
 时,我才需要 
reverse
$2==rev

字段 $4 和 $5

期待:

m64071_220512_054244/12584899/ccs rev pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/128321000/ccs rev pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/132186699/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA

我试过了:

perl -lpe 'if(/rev/) {$rev=/rev/;next}; if ($rev) {$F[4,5]=~tr/ATGC/TACG/; $F[4,5]=reverse $F[4,5]; print "@F"}' file 

我还尝试使用 awk(在 awk 中执行 bash 命令并打印命令输出

awk '{
            if($2==rev)
        {
            cmd1="echo \047" $4 "\047 | rev | tr \047ATGC\047 \047TACG\047" 
            cmd2="echo \047" $5 "\047 | rev | tr \047ATGC\047 \047TACG\047"
            newVar1=((cmd1 | getline line) > 0 ? line : "failed") 
            newVar2=((cmd2 | getline line) > 0 ? line : "failed")
            close(cmd)
            print $1, $2, $3, newVar1, newVar2
        }
        else {print}
}' file
bash perl awk
1个回答
0
投票

这是在 Python 中执行此操作的方法:

#!/usr/bin/env python

import io
import sys

RECORDS_STR = '''m64071_220512_054244/12584899/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/128321000/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/132186699/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA'''

'''
fast pure-Python reverse complement, courtesy of Devon Ryan
ref. https://bioinformatics.stackexchange.com/a/3585/776
'''
DNA_TABLE = str.maketrans("ACTGactg", "TGACtgac")
def reverse_complement(seq):
    return seq.translate(DNA_TABLE)[::-1]

def main():
    records = io.StringIO(RECORDS_STR) # replace with sys.stdin etc.
    for line in records:
        elems = line.rstrip().split()
        if elems[1] == 'rev':
            elems[3] = reverse_complement(elems[3])
            elems[4] = reverse_complement(elems[4])
        sys.stdout.write('{}\n'.format('\t'.join(elems)))

if __name__ == "__main__":
    main()

输出:

m64071_220512_054244/12584899/ccs   rev pet047-10055    TCACAAGGTCGCACGT    TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/128321000/ccs  rev pet047-10055    TCACAAGGTCGCACGT    TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/132186699/ccs  fwd pet047-10055    TCACAAGGTCGCACGT    TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs  fwd pet047-10055    TCACAAGGTCGCACGT    TCACAAGGTCGCACGTTTGAACCCTCAA
© www.soinside.com 2019 - 2024. All rights reserved.