[使用UNIX / Linux处理(排除行).csv文件的特定列

问题描述 投票:0回答:2

我想访问和处理csv文件的第四列。特别是要排除不符合特定要求的行(排除不包含3个字符的国家/地区代码的行)。

我的数据集:

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41
World,OWIDWRL,1755,9361520
India,INDIA,1763,0
Asia and Pacific (other),,2017,5071156099
World,OWID_WRL,1752,9354192
Middle East,,1751,0
International transport,,1751,0
India,IND,1751,0
Europe (other),,1751,0
China,CHN,1751,0
Asia and Pacific (other),,1751,0
Americas (other),,1751,0
Africa,,1751,0

提前感谢。

我已经按年份对数据文件进行了排序,但我不知道如何访问第4列并使用awk或sed。

预期数据集:

Luxembourg,LUX,2017,9294689.12
Aruba,ABW,2017,927865.82
Nepal,NPL,2017,9028196.37
Bangladesh,BGD,2017,88057460.51
Costa Rica,CRI,2017,8695008.05
Chile,CHL,2017,84603249.72
Cook Islands,COK,2017,82045.41
regex linux csv unix data-manipulation
2个回答
0
投票
下面在第二个字段中仅输出具有3个字母值的行:

awk --re-interval -F, 'tolower($2) ~ /^[a-z]{3}$/' country.txt

也可以检查长度,但这可以确保仅提供3个字母。 

[--re-internval允许您在RE中使用itnernval表达式,因为大括号是awk中的保留字符。

[-F,告诉awk输入分隔符是逗号。

[print是awk中的默认操作,因此tolower($2) ~ /^[a-z]{3}$/是表示tolower($2) ~ /^[a-z]{3}$/ {print}的简写方式

tolower($2)使第二个字段的值小写,并且~是正则表达式比较运算符,我们用它来检查字符串^的开头,然后检查[a-z]重复{3}次并字符串$的结尾。


0
投票
如果我正确回答了您的问题,请尝试以下。如果代码在任何行的第二个字段中没有正好包含3个字符的地方,则不要打印该行。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z]{3}$/' Input_file

如果您使用旧的awk,但在{3}范围不起作用的情况下尝试。

awk 'BEGIN{FS=","} $2~/^[a-zA-Z][a-zA-Z][a-zA-Z]$/' Input_file



说明:在此处添加上述代码的说明。

awk ' ##Starting awk program here. BEGIN{ ##Starting BEGIN section from here. Which will be executed before Input_file is being read FS="," ##Setting field separator as comma here. } ##Closing BEGIN section here. $2~/^[a-zA-Z]{3}$/ ##Checking condition if 2nd field is starting with alphabets 3 occurrence of it and ending with it too. ##Since awk works on method of condition then action; so if condition is TRUE then perform certain action. ##In this case no action given so by default print of line will happen. ' Input_file ##Mentioning Input_file name here.
© www.soinside.com 2019 - 2024. All rights reserved.