如何将此文本拆分为多行记录?

问题描述 投票:0回答:1
#!/usr/bin/bash

mailing_list="Jane Doe
123 Main Street
Anywhere, SE 12345-6789

John Smith
456 Tree-lined Avenue
Smallville, MW 98765-4321



Amir Faquer
C. de la Lusitania 98
08206 Sabadell
        
Amir Faquer w spaces before
C. de la Lusitania 98
08206 Sabadel
    
      
      
      
Wife w spaces before
C. de la Lusitania 98
08206 Sabadell
"
echo "$mailing_list"|awk -v RS='' -v FS='\n' '/.*/ 
END {print "The number of records is "NR"."}'

echo "$mailing_list"|awk -v RS='\n\n+' -v FS='\n' '/.*/ 
END {print "The number of records is "NR"."}'

echo "$mailing_list"|awk -v RS='\n *\n+' -v FS='\n' '/.*/ 
END {print "The number of records is "NR"."}'


我如何将此邮件列表拆分为多行记录,而不仅仅是当只有

RS='\n\n+'
时。我的代码的最后一行告诉我记录数是 7,这是不正确的 - 只有 5 条记录。我还希望具有任意数量空白的空白行充当
RS
。我怎样才能做到这一点?

regex awk multiline
1个回答
0
投票

我是 awk 的新手,但有兴趣发现这个工具。我尝试过 我想我已经得到了你想要的:

#!/usr/bin/bash

mailing_list="Jane Doe
123 Main Street
Anywhere, SE 12345-6789

John Smith
456 Tree-lined Avenue
Smallville, MW 98765-4321



Amir Faquer
C. de la Lusitania 98
08206 Sabadell

Amir Faquer w spaces before
C. de la Lusitania 98
08206 Sabadel




Wife w spaces before
C. de la Lusitania 98
08206 Sabadell
"

echo "$mailing_list" | awk 'BEGIN { RS = "(\\n\\s*){2,}" ; FS = "\\n" }
{
        print "Record n°" NR
        print "----------"
        print "Name   : " $1
        print "Street : " $2
        print "City   : " $3
        print ""
}
END {
        print ""
        print "The number of records is " NR "."
}'

这将输出以下内容:

Record n°1
----------
Name   : Jane Doe
Street : 123 Main Street
City   : Anywhere, SE 12345-6789

Record n°2
----------
Name   : John Smith
Street : 456 Tree-lined Avenue
City   : Smallville, MW 98765-4321

Record n°3
----------
Name   : Amir Faquer
Street : C. de la Lusitania 98
City   : 08206 Sabadell

Record n°4
----------
Name   : Amir Faquer w spaces before
Street : C. de la Lusitania 98
City   : 08206 Sabadel

Record n°5
----------
Name   : Wife w spaces before
Street : C. de la Lusitania 98
City   : 08206 Sabadell


The number of records is 5.

问题似乎只是在于你必须编写的方式

RS
FS
变量的正则表达式。如果是双引号字符串 那么你必须记住
"\n"
表示换行符 本身。但正则表达式引擎必须接收
"\"
后跟
"n"
。这意味着你必须写
"\\n\\n+"
而不是
"\n\n+"

然后,要处理带有空格的可选行,最好使用 正则表达式模式

\s
因为它也匹配制表符。所以要匹配2 换行符或更多带有可选空格的行,您可以使用
(\n\s*){2,}
, 必须用双引号字符串写成
"(\\n\\s*){2,}"

© www.soinside.com 2019 - 2024. All rights reserved.