计算第二列中对应于文件第一列的字符串出现次数

问题描述 投票:0回答:2

我有这个输入文本文件:

CD196_RS15035       normal alleles
CD196_RS15035       normal alleles
CD196_RS15035       truncation in the allele
CD196_RS15035       truncation in the allele
CD196_RS15035       no stop for allele
CD196_RS15035       no stop for allele
CD196_RS16835       normal alleles
CD196_RS16835       truncation in the allele
CD196_RS16835       no stop for allele
CD196_RS16835       no stop for allele

我想计算每个字符串在与第一列相对应的第二列中出现的次数。

我想要这样的输出文本文件:

CD196_RS15035  normal alleles  2    truncation in the allele   2    no stop for allele  2
 
CD196_RS16835  normal alleles  1    truncation in the allele   1    no stop for allele  2

任何提示都会有帮助。谢谢你。

linux awk text-processing
2个回答
1
投票

awk
的多维数组:

awk -F'[ ]{2,}'
  '{ a[$1][$2]+=1 }
   END{ 
       for (i in a) { 
           printf("%s ", i);
           for (j in a[i]) printf("%s %d ", j, a[i][j]); 
           print "";  
       }
   }'
  test.txt

CD196_RS15035 normal alleles 2 no stop for allele 2 truncation in the allele 2 
CD196_RS16835 normal alleles 1 no stop for allele 2 truncation in the allele 1 

1
投票

一个潜在的选择可能是通过循环变量来“构建”每一行,例如

awk 'BEGIN {
    FS = OFS = "\t"
}

{
    a[$1 FS $2]++
    b[$1]
    c[$2]
}

END {
    for (i in b) {
        output = i
        for (j in c) {
            output = output FS j FS a[i FS j]
        }
        print output
    }
}' file.txt
CD196_RS15035   normal alleles  2   no stop for allele  2   truncation in the allele    2
CD196_RS16835   normal alleles  1   no stop for allele  2   truncation in the allele    1
© www.soinside.com 2019 - 2024. All rights reserved.