我正在尝试从管道定界文件中提取所有(仅)重复值。
我的数据文件有80万行,其中有多列,我对第3列特别感兴趣。因此,我需要获取第3列的重复值,并从该文件中提取所有重复的行。
但是我能够实现此目标,如下所示。
cat Report.txt | awk -F'|' '{print $3}' | sort | uniq -d >dup.txt
并且我将上述内容带入循环,如下所示。
while read dup
do
grep "$dup" Report.txt >>only_dup.txt
done <dup.txt
我也尝试过awk方法
while read dup
do
awk -v a=$dup '$3 == a { print $0 }' Report.txt>>only_dup.txt
done <dup.txt
但是,由于文件中包含大量记录,因此需要很长时间才能完成。因此,我正在寻找一种简便快捷的选择。
例如,我有这样的数据:
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
4|learning|Unix|Business|Team
5|learning|Linux|Business|Requirements
6|learning|Unix|Business|Team
7|learning|Windows|Business|Requirements
以及我的预期输出,其中不包括唯一记录:
2|learning|Unix|Business|Team
4|learning|Unix|Business|Team
6|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
5|learning|Linux|Business|Requirements
另一个awk:
$ awk '{
n=$1 # store number
# sub("^" n,"",$0) # remove from $0 (not my brightest moment)
sub(/^[^ ]*/,"",$0) # better, see above :D
if($0 in a) { # if $0 in a
if(a[$0]==1) # if $0 seen the second time
print b[$0] $0 # print number and rest
print n $0 # also print current
}
a[$0]++ # increase match count for $0
b[$0]=n # number stored to b and only needed once
}' file
示例数据的输出:
2 learning Unix Business Team
4 learning Unix Business Team
3 learning Linux Business Requirements
5 learning Linux Business Requirements
6 learning Unix Business Team
您问题中的示例尚不清楚,但给出了此输入文件:
$ cat file
1 | whatever | learning Unix Business Requirements
2 | whatever | learning Unix Business Team
3 | whatever | learning Linux Business Requirements
4 | whatever | learning Unix Business Team
5 | whatever | learning Linux Business Requirements
6 | whatever | learning Unix Business Team
7 | whatever | learning Windows Business Requirements
这可能是您想要的:
$ cat tst.awk
BEGIN { FS="|" }
{ currKey = $3 }
currKey == prevKey {
if ( !prevPrinted++ ) {
print prevRec
}
print
next
}
{
prevKey = currKey
prevRec = $0
prevPrinted = 0
}
$ sort -t'|' -k3,3 file | awk -f tst.awk
3 | whatever | learning Linux Business Requirements
5 | whatever | learning Linux Business Requirements
2 | whatever | learning Unix Business Team
4 | whatever | learning Unix Business Team
6 | whatever | learning Unix Business Team
使用新发布的示例输入运行以上操作:
$ sort -t'|' -k3,3 file | awk -f tst.awk
3|learning|Linux|Business|Requirements
5|learning|Linux|Business|Requirements
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
4|learning|Unix|Business|Team
6|learning|Unix|Business|Team