文本处理忽略下划线的第二次出现

问题描述 投票:-1回答:1

下划线第二次出现时,数据将被忽略,应该对其进行排序,并且需要消除重复项。

awk -F_ '{print $2}' file1 >> file 2; sort file1 | uniq ; i tried

****** FROM ********

GGGGGGG             DDDDD   --> header
XYSER_YURTZ     SUMOT_2_058A     
XYSER_YURTZ     SUMOT_2_058B    
XYSER_YURTZ     HJRIT_6_51A     
XYSER_YURTZ     HJRIT_6_51B     
XYSER_YURTZ     HJRIT_6_51C    
XYSER_YURTZ     HJRIT_6_51D    
XYSER_YURTZ     HJRIT_6_51E    
XYSER_YURTZ     HJRIT_6_51F    
XYSER_YURTZ     HJRIT_6_520    
XYSER_YURTZ     HJRIT_6_521    
XYSER_GFRE      SUMOT_2_16C3    
XYSER_GFRE      SUMOT_2_16C4    
XYSER_GFRE      SUMOT_2_16C5    
XYSER_GFRE      SUMOT_2_16C6  
XYSER_GFRE      SUMOT_2_16C7  
XYSER_GFRE      SUMOT_2_16C8  
XYSER_GFRE      SUMOT_2_16C9  
XYSER_GFRE      SUMOT_2_16CA  
XYSER_GFRE      SUMOT_2_16CB  
XYSER_GFRE      SUMOT_2_16CC   
XYSER_GFRE      SUMOT_2_16CD  
XYSER_GFRE      SUMOT_2_16CE   
XYSER_GFRE      SUMOT_2_16CF  
XYSER_GFRE      SUMOT_2_16D0  
XYSER_GFRE      SUMOT_2_16D1  
XYSER_GFRE      SUMOT_2_16D2  
XYSER_GFRE      SUMOT_2_16D3  
XYSER_GFRE      SUMOT_2_16D4  
XYSER_GFRE      HJRIT_6_12E1    
XYSER_GFRE      HJRIT_6_12E2    
XYSER_GFRE      HJRIT_6_12E3    
XYSER_GFRE      HJRIT_6_12E4    
XYSER_GFRE      HJRIT_6_12E5   
XYSER_GFRE      HJRIT_6_12E6   
XYSER_GFRE      HJRIT_6_12E7   
XYSER_GFRE      HJRIT_6_12E8   
XYSER_GFRE      HJRIT_6_12E9   
XYSER_GFRE      HJRIT_6_12EA   
XYSER_GFRE      HJRIT_6_12EB   
XYSER_GFRE      HJRIT_6_12EC   
XYSER_GFRE      HJRIT_6_12ED   
XYSER_ALY1      XYSER_ALY1_0000   
XYSER_ALY       SUMOT_2_0497   
XYSER_ALY       SUMOT_2_0498   
XYSER_BAP01     SUMOT_2_020E 

TO

************** OUTPUT1 **************

GGGGGGG DDDDD   
XYSER_YURTZ SUMOT_2   
XYSER_YURTZ HJRIT_6   
XYSER_GFRE SUMOT_2   
XYSER_GFRE HJRIT_6   
XYSER_ALY1 XYSER_ALY1   
XYSER_ALY SUMOT_2       
XYSER_BAP01 SUMOT_2   
XYSER_BAP02 SUMOT_2   

************** OUTPUT2 **************

DDDDD GGGGGGG   
SUMOT_2 XYSER_YURTZ  
SUMOT_2 XYSER_GFRE  
SUMOT_2 XYSER_ALY  
SUMOT_2 XYSER_BAP01  
SUMOT_2 XYSER_BAP02  
HJRIT_6 XYSER_YURTZ  
HJRIT_6 XYSER_GFRE  
XYSER_ALY1 XYSER_ALY1  
unix awk text-processing
1个回答
0
投票

通过示例输入,您可以使用

sed 's/_[^_]*$//' inputfile|sort|uniq

这将删除最后一个下划线和所有后续字符。

注意:sort命令可能会将标题放在其他行之间,因为它将按字母数字顺序对整个数据进行排序。在您的示例中,这不是问题,因为标题行GGGGGGG...将在XYSER_...之前排序。

如果您知道相似的行已在输入文件中分组,则可以省略排序和使用

sed 's/_[^_]*$//' inputfile|uniq
© www.soinside.com 2019 - 2024. All rights reserved.