根据特定的重复条件从文本文件中删除重复数据

问题描述 投票:0回答:2

我有一个文本文件,我想删除一些行。文件的示例内容如下 -

v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
------------------
and so on

因为它看到上面的1.1和10.2值重复了几次,我想保留1.1和10.2的前10行并且很像它们(这些值是不同的并且在数百个不同的数字中)但删除所有后续重复,即使值的v参数每次都不同,并且还希望保留非重复数据。

我尝试使用uniq排序,但它只消除了相同的匹配重复,但不是基于特定条件。

sort file.txt | uniq -i
awk sed grep
2个回答
1
投票

听起来你需要的只是:

awk '++cnt[$NF]<11' file

EG

$ cat file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
v7 has output 1.1
v8 has output 10.2
v9 has output 5.4
v10 has output 1.1
v11 has output 10.2
v12 has output 12

$ awk '++cnt[$NF]<3' file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
v9 has output 5.4
v12 has output 12

1
投票

这是一个awk

awk 'a[$4==1.1 || $4==10.2]++<10 {print;next} !($4==1.1 || $4==10.2)' file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12

它使用1.110.2以及所有其他行打印出所有行中的10个

© www.soinside.com 2019 - 2024. All rights reserved.