我的 csv 文件格式错误,具有不同数量的分隔符(分隔符是逗号)。 我需要根据分隔符的数量将源文件拆分为多个文件。
$ cat ~/Desktop/sample_file.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
1004,Donald,Duck,Redmond,WA,98004,So funny
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
需要unix命令根据分隔符的数量将其拆分为多个文件。 例如,上面的示例文件需要分为3个文件:
$ cat sample_file_6_delimiter.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny
$ cat sample_file_5_delimiter.csv
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
$ cat sample_file_4_delimiter.csv
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
一个
awk
想法:
awk '
BEGIN { FS = OFS = "," } # set input/output field separate or as ","; setting OFS is optional in this case since we always output the entire line (print $0)
FNR==1 { basename = FILENAME # 1st record of file: make copy of input FILENAME
sub(/.csv$/,"",basename) # strip ".csv" off end of filename
}
{ print $0 > (basename "_" (NF-1) "_delimiter.csv") } # "NF" == number of fields, "NF-1" == number of delimiters
' sample_file.csv
这会生成:
$ head sample*delimiter.csv
==> sample_file_4_delimiter.csv <==
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
==> sample_file_5_delimiter.csv <==
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
==> sample_file_6_delimiter.csv <==
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny