通过条件将awk / unix中的重复列合并成行

问题描述 投票:1回答:2

我们如何使用awk / sed / unix命令处理数据。我有如下数据:

/abc/def1.0/Acc101 500 50
/abc/def1.0/Acc101 401 27
/abc/def1.0/Acc101 200 101
/abc/def1.0/Acc201 200 4
/abc/def1.0/Acc301 304 2
/abc/def1.0/Acc401 200 204

对于第一列$ 1中的每个唯一字符串,我们如何合并由值分隔的值。 $ 2列是代码,如果它的200表示成功,则表示失败,否则表示失败。 $ 3是事件的计数。

下面是样本输出,因为我们区分了$ 1并验证了$ 2中值200或不为200的值,并合并/求和$ 3中的计数。示例如下:

/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0

该行的信息:/abc/def1.0/Acc101 101 77

77 =来自$ 3的50 + 27之和,其中$ 2的值= 200

非常感谢您的帮助。

shell awk sed
2个回答
1
投票

类似

awk '{ groups[$1] = 1; if ($2 == 200) succ[$1] += $3; else fail[$1] += $3 }
     END { PROCINFO["sorted_in"] = "@ind_str_asc"
           for (g in groups) print g, succ[g]+0, fail[g]+0 }' input.txt
/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0

[如果使用GNU awk,则PROCINFO行将产生已排序的输出,否则,该顺序是任意的,如果要对其进行排序,则可以将其通过管道传递到sort


0
投票

为了简便起见,您可以阅读Input_file 2次,可以尝试追踪一次。

awk '
FNR==NR{
  mainarray[$1]
  if($2!=200){
    sum[$1]+=$NF
  }
  if($2==200){
    Found200[$1]+=$NF
  }
  next
}
($1 in mainarray) && !($1 in Found200){
  print $1,0,sum[$1]!=""?sum[$1]:0
  next
}
$2==200{
  print $1,Found200[$1]!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0
}
'  Input_file  Input_file

说明:添加上述详细信息。

awk '                                                           ##Starting awk program from here.
FNR==NR{                                                        ##FNR==NR condition will be TRUE when first time Input_file will be read.
  mainarray[$1]                                                 ##Creating array with index $1 here.
  if($2!=200){                                                  ##Creating array named sumwith index $1 and keep adding last column value in it.
    sum[$1]+=$NF                                                ##Creating array named sumwith index $1 and keep adding last column value in it
  }
  if($2==200){                                                  ##Checking condition if 2nd field is equal to 200 then do following.
    Found200[$1]+=$NF                                           ##Creating array Found200 with index #1and keep adding last column value to its value.
  }
  next                                                          ##next will skip all further statements from here.
}
($1 in mainarray) && !($1 in Found200){                         ##Checking condition if $1 is present in mainarray and $1 is NOT present in Found200 array.
  print $1,0,sum[$1]!=""?sum[$1]:0                              ##Printing first field, zero and value of sum with $1 here.
  next                                                          ##next will skip all further statements from here.
}
$2==200{                                                        ##Checking condition if 3rd field is 200 then do following.
  print $1,$NF!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0         ##Printing first field, Found200 vaue with sum value.
}
' Input_file  Input_file                                      ##Mentioning Input_file names here.
© www.soinside.com 2019 - 2024. All rights reserved.