我们如何使用awk / sed / unix命令处理数据。我有如下数据:
/abc/def1.0/Acc101 500 50
/abc/def1.0/Acc101 401 27
/abc/def1.0/Acc101 200 101
/abc/def1.0/Acc201 200 4
/abc/def1.0/Acc301 304 2
/abc/def1.0/Acc401 200 204
对于第一列$ 1中的每个唯一字符串,我们如何合并由值分隔的值。 $ 2列是代码,如果它的200表示成功,则表示失败,否则表示失败。 $ 3是事件的计数。
下面是样本输出,因为我们区分了$ 1并验证了$ 2中值200或不为200的值,并合并/求和$ 3中的计数。示例如下:
/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0
该行的信息:/abc/def1.0/Acc101 101 77
77 =来自$ 3的50 + 27之和,其中$ 2的值= 200
非常感谢您的帮助。
类似
awk '{ groups[$1] = 1; if ($2 == 200) succ[$1] += $3; else fail[$1] += $3 }
END { PROCINFO["sorted_in"] = "@ind_str_asc"
for (g in groups) print g, succ[g]+0, fail[g]+0 }' input.txt
/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0
[如果使用GNU awk,则PROCINFO
行将产生已排序的输出,否则,该顺序是任意的,如果要对其进行排序,则可以将其通过管道传递到sort
。
为了简便起见,您可以阅读Input_file 2次,可以尝试追踪一次。
awk '
FNR==NR{
mainarray[$1]
if($2!=200){
sum[$1]+=$NF
}
if($2==200){
Found200[$1]+=$NF
}
next
}
($1 in mainarray) && !($1 in Found200){
print $1,0,sum[$1]!=""?sum[$1]:0
next
}
$2==200{
print $1,Found200[$1]!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0
}
' Input_file Input_file
说明:添加上述详细信息。
awk ' ##Starting awk program from here.
FNR==NR{ ##FNR==NR condition will be TRUE when first time Input_file will be read.
mainarray[$1] ##Creating array with index $1 here.
if($2!=200){ ##Creating array named sumwith index $1 and keep adding last column value in it.
sum[$1]+=$NF ##Creating array named sumwith index $1 and keep adding last column value in it
}
if($2==200){ ##Checking condition if 2nd field is equal to 200 then do following.
Found200[$1]+=$NF ##Creating array Found200 with index #1and keep adding last column value to its value.
}
next ##next will skip all further statements from here.
}
($1 in mainarray) && !($1 in Found200){ ##Checking condition if $1 is present in mainarray and $1 is NOT present in Found200 array.
print $1,0,sum[$1]!=""?sum[$1]:0 ##Printing first field, zero and value of sum with $1 here.
next ##next will skip all further statements from here.
}
$2==200{ ##Checking condition if 3rd field is 200 then do following.
print $1,$NF!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0 ##Printing first field, Found200 vaue with sum value.
}
' Input_file Input_file ##Mentioning Input_file names here.