根据分隔符数量分割文件

问题描述 投票:0回答:1

我的 csv 文件格式错误,具有不同数量的分隔符(分隔符是逗号)。 我需要根据分隔符的数量将源文件拆分为多个文件。

$ cat ~/Desktop/sample_file.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
1004,Donald,Duck,Redmond,WA,98004,So funny
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA

需要unix命令根据分隔符的数量将其拆分为多个文件。 例如,上面的示例文件需要分为3个文件:

$ cat sample_file_6_delimiter.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny

$ cat sample_file_5_delimiter.csv
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003

$ cat sample_file_4_delimiter.csv
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
shell unix awk
1个回答
0
投票

一个

awk
想法:

awk '
BEGIN  { FS = OFS = "," }                                     # set input/output field separate or as ","; setting OFS is optional in this case since we always output the entire line (print $0)
FNR==1 { basename = FILENAME                                  # 1st record of file: make copy of input FILENAME
         sub(/.csv$/,"",basename)                             # strip ".csv" off end of filename
       }
       { print $0 > (basename "_" (NF-1) "_delimiter.csv") }  # "NF" == number of fields, "NF-1" == number of delimiters
' sample_file.csv

这会生成:

$ head sample*delimiter.csv
==> sample_file_4_delimiter.csv <==
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA

==> sample_file_5_delimiter.csv <==
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003

==> sample_file_6_delimiter.csv <==
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny
© www.soinside.com 2019 - 2024. All rights reserved.