awk比较两个文件中的两列,并将调整后的非匹配值追加到输出中。

问题描述 投票:0回答:1

你好,我试图比较file1和file2。

  1. 它从file1的1,2列和file2的1,3列进行比较。
  2. 如有匹配 它将从文件1和文件2输出到输出文件,在比较过程中从文件1复制1,2,7,9列,从文件2复制1,3,6,7,8列到输出文件。
  3. 如果不符合 它将从file2中添加剩余的非匹配列到输出文件中。
  4. 最后,它将在输出文件的第5列中加入结果的递增值。

我很喜欢这个。

awk 'NR==FNR {a[$1,$3]=$0; next}
             {if(($1,$3) in a)
             {print a[$1,$3],$0; delete a[$1,$2]}
             else print $0}
     END    {for(k in a) print a[k]}' file1 file2

文件1

SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7f FEX 98a7df9asd7f_a     
SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7g FEX 98a7df9asd7f_b     
SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7h FEX 98a7df9asd7f_c     
SITE-B SERV-A BB 1.00 DF IP a7sdf9899hhh FEX a7sdf9899hhh_a     
SITE-B SERV-A BB 1.00 DF IP a7sdf9899hhf FEX a7sdf9899hhh_b     
SITE-B SERV-A BB 1.00 AF IP a7sdf9899hhm FEX a7sdf9899hhh_c 

file2

SITE-A 17 SERV-A 0 39 idx a7sdf9899778 0 0 out_fan pri
SITE-A 17 SERV-A 1 1 test a7sdf9899779 1 0 out_fan pri
SITE-A 17 SERV-A 2 32 dummy_host a7sdf9899770 2 0 out_fan pri
SITE-C 22 SERV-A 2 519 dummy_host a7sdf9899772 2 2 out_fan pri  
SITE-C 22 SERV-A 3 520 prod a7sdf9899775 3 out_fan pri  
SITE-C 22 SERV-A 4 521 dev a7sdf9899774 4 out_fan pri 

所需的输出。

SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A 98a7df9asd7f_a 98a7df9asd7f 3
SITE-A SERV-A 98a7df9asd7f_b 98a7df9asd7g 4
SITE-A SERV-A 98a7df9asd7f_c 98a7df9asd7h 5
SITE-B SERV-A a7sdf9899hhh_a a7sdf9899hhh 0
SITE-B SERV-A a7sdf9899hhh_b a7sdf9899hhf 1
SITE-B SERV-A a7sdf9899hhh_c a7sdf9899hhm 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4 
awk compare multiple-columns increment
1个回答
1
投票
$ cat tst.awk
NR==FNR {
    key = $1 FS $3
    a[key] = a[key] key OFS $6 OFS $7 OFS $8 ORS
    cnt[key]++      # or cnt[key] = $8 + 1
    next
}
{
    key = $1 FS $2
    if ( key != prev ) {
        printf "%s", a[key]
        delete a[key]
        prev = key
    }
    print key, $6, $7, $8, cnt[key]++
}
END {
    for ( key in a ) {
        printf "%s", a[key]
    }
}

.

$ awk -f tst.awk file2 file1
SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A IP 98a7df9asd7f FEX 3
SITE-A SERV-A IP 98a7df9asd7g FEX 4
SITE-A SERV-A IP 98a7df9asd7h FEX 5
SITE-B SERV-A IP a7sdf9899hhh FEX 0
SITE-B SERV-A IP a7sdf9899hhf FEX 1
SITE-B SERV-A IP a7sdf9899hhm FEX 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4

不清楚你是想让第5个输出字段的file1行数从file2给定的键的行数开始,还是从file2的$8值开始,所以我把两个选项都包括进去了,一个作为注释。

在这里,我加入了两个选项,一个作为注释。for ( key in a ) 将以 "随机 "顺序打印file2中剩余的行数(见 https:/www.gnu.orgsoftwaregawkmanualgawk.html#Controlling-Array-Traversal),如果这是个问题,你只需要在读取file2的时候(例如在开始的时候)保留一个单独的索引递增的键数组,然后在END部分使用这个数组来按这个顺序获取键值(如 if (!(key in a)) keys[++numKeys]=key 在开始的时候),然后在END部分使用这个数组按这个顺序获取键值 (for (keynr=1; keyNr<=numKeys; keyNr++) { key=keys[keyNr] ...),即::

$ cat tst.awk
NR==FNR {
    key = $1 FS $3
    if ( !(key in a) ) {
        keys[++numKeys] = key
    }
    a[key] = a[key] key OFS $6 OFS $7 OFS $8 ORS
    cnt[key]++
    next
}
{
    key = $1 FS $2
    if ( key != prev ) {
        printf "%s", a[key]
        delete a[key]
        prev = key
    }
    print key, $6, $7, $8, cnt[key]++
}
END {
    for ( keyNr=1; keyNr<=numKeys; keyNr++ ) {
        key = keys[keyNr]
        printf "%s", a[key]
    }
}

.

$ awk -f tst.awk file2 file1
SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A IP 98a7df9asd7f FEX 3
SITE-A SERV-A IP 98a7df9asd7g FEX 4
SITE-A SERV-A IP 98a7df9asd7h FEX 5
SITE-B SERV-A IP a7sdf9899hhh FEX 0
SITE-B SERV-A IP a7sdf9899hhf FEX 1
SITE-B SERV-A IP a7sdf9899hhm FEX 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4
© www.soinside.com 2019 - 2024. All rights reserved.