对仅第一行以非空格开头的多行段落进行排序的简单方法?

问题描述 投票:0回答:1

我有一个文本或日志文件,通常如下所示:

First line which is also a paragraph.
Another line that is its own paragraph.
etc. etc.

但有时它会溢出到多行段落中:

First line which is also a paragraph.
Another line that is its own paragraph.
Now, this paragraph encompasses more than a single line
    with its second line onwards being indented by spaces
    to distinguish it from the paragraph-opener, although
    it could just as well have been tabs etc.
This is another paragraph.

我想按字典顺序对这些段落进行排序;我不介意它是仅第一行还是整个段落。如果这些是一行行段落 - 那么鲍勃是你的叔叔,我们得到了

sort
。但除此之外我还能做什么呢?

我知道,原则上,我可以:

  1. 定义转义方案
  2. 转义后跟空格的换行符(并转义转义字符本身)
  3. 对生成的每段落一行文件进行排序
  4. 无法逃脱

但这似乎有点麻烦。我可以做得更好吗?

注意:我意识到使用 awk 或 perl 脚本可以以一种简单的方式做到这一点,但答案越接近单行就越好。

shell sorting command-line multiline paragraph
1个回答
0
投票

单行管道,使用

perl
读取整个文件,并在段落之间插入 0 字节(定义为换行符,紧跟非空白字符),
sort
对它们进行排序,最后
tr 
从最终输出中再次删除那些 0 字节。基本上是您想法的简单版本。

$ perl -0777 -pe 's/^(?=\S)/\0/gm' input.txt | sort -z | tr -d '\0'
Another line that is its own paragraph.
First line which is also a paragraph.
Now, this paragraph encompasses more than a single line
    with its second line onwards being indented by spaces
    to distinguish it from the paragraph-opener, although
    it could just as well have been tabs etc.
This is another paragraph.

(需要支持

sort
选项的
-z
版本)


或者,如果您可以安装额外的东西,我发现了一个用

perl
编写的漂亮程序,称为
ptp
(通过操作系统包管理器安装(如果可用)或使用
cpan App::PTP
/
cpanm App::PTP
/其他首选 CPAN客户):

$ ptp --input-separator '\n(?=\S)' --sort input.txt
Another line that is its own paragraph.
First line which is also a paragraph.
Now, this paragraph encompasses more than a single line
    with its second line onwards being indented by spaces
    to distinguish it from the paragraph-opener, although
    it could just as well have been tabs etc.
This is another paragraph.
© www.soinside.com 2019 - 2024. All rights reserved.