我有这个输入文本文件:
CD196_RS15035 normal alleles
CD196_RS15035 normal alleles
CD196_RS15035 truncation in the allele
CD196_RS15035 truncation in the allele
CD196_RS15035 no stop for allele
CD196_RS15035 no stop for allele
CD196_RS16835 normal alleles
CD196_RS16835 truncation in the allele
CD196_RS16835 no stop for allele
CD196_RS16835 no stop for allele
我想计算每个字符串在与第一列相对应的第二列中出现的次数。
我想要这样的输出文本文件:
CD196_RS15035 normal alleles 2 truncation in the allele 2 no stop for allele 2
CD196_RS16835 normal alleles 1 truncation in the allele 1 no stop for allele 2
任何提示都会有帮助。谢谢你。
与
awk
的多维数组:
awk -F'[ ]{2,}'
'{ a[$1][$2]+=1 }
END{
for (i in a) {
printf("%s ", i);
for (j in a[i]) printf("%s %d ", j, a[i][j]);
print "";
}
}'
test.txt
CD196_RS15035 normal alleles 2 no stop for allele 2 truncation in the allele 2
CD196_RS16835 normal alleles 1 no stop for allele 2 truncation in the allele 1
一个潜在的选择可能是通过循环变量来“构建”每一行,例如
awk 'BEGIN {
FS = OFS = "\t"
}
{
a[$1 FS $2]++
b[$1]
c[$2]
}
END {
for (i in b) {
output = i
for (j in c) {
output = output FS j FS a[i FS j]
}
print output
}
}' file.txt
CD196_RS15035 normal alleles 2 no stop for allele 2 truncation in the allele 2
CD196_RS16835 normal alleles 1 no stop for allele 2 truncation in the allele 1