我正在寻找一种简单的解决方案,以使每行文件(CSV文件)中的逗号数均相同
例如
文件示例:
1,1
A,B,C,D,E,F
2,2,
3,3,3,
4,4,4,4
预期:
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
逗号最多的行在这种情况下有5个逗号(第2行)。因此,我想在所有行中添加其他逗号以使每行具有相同的编号(即5个逗号)
使用awk:
$ awk 'BEGIN{FS=OFS=","} {$6=$6} 1' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
如您在上面看到的,在这种方法中,最大命令中必须对字段数进行硬编码。
另一种方法是使CSV文件中的所有行都具有相同数量的字段。字段数不需要知道。将计算max
字段,并将所需逗号的子字符串附加到每个记录,例如
awk -F, -v max=0 '{
lines[n++] = $0 # store lines indexed by line number
fields[lines[n-1]] = NF # store number of field indexed by $0
if (NF > max) # find max NF value
max = NF
}
END {
for(i=0;i<max;i++) # form string with max commas
commastr=commastr","
for(i=0;i<n;i++) # loop appended substring of commas
printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
}' file
示例使用/输出
在命令行粘贴,您将收到:
$ awk -F, -v max=0 '{
> lines[n++] = $0 # store lines indexed by line number
> fields[lines[n-1]] = NF # store number of field indexed by $0
> if (NF > max) # find max NF value
> max = NF
> }
> END {
> for(i=0;i<max;i++) # form string with max commas
> commastr=commastr","
> for(i=0;i<n;i++) # loop appended substring of commas
> printf "%s%s\n", lines[i], substr(commastr,1,max-fields[lines[i]])
> }' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,
您能否以更通用的方式尝试遵循。即使您的Input_file中的字段数不相同,此代码也将起作用,并且将首先从整个文件中读取并获取最大字段数,然后第二次读取文件,它将重置这些字段(为什么,因为我们将OFS设置为,所以如果当前行的字段数小于nf值,许多逗号将添加到该行)。 @oguz ismail的答案的增强版。
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
nf=nf>NF?nf:NF
next
}
{
$nf=$nf
}
1
' Input_file Input_file
说明:添加以上代码的详细说明。
awk ' ##Starting awk program frmo here.
BEGIN{ ##Starting BEGIN section of awk program from here.
FS=OFS="," ##Setting FS and OFS as comma for all lines here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
nf=nf>NF?nf:NF ##Creating variable nf whose value is getting set as per condition, if nf is greater than NF then set it as NF else keep it as it is,
next ##next will skip all further statements from here.
}
{
$nf=$nf ##Mentioning $nf=$nf will reset current lines value and will add comma(s) at last of line if NF is lesser than nf.
}
1 ##1 will print edited/non-edited lines here.
' Input_file Input_file ##Mentioning Input_file names here.