Pyparsing 段落

问题描述 投票:0回答:2

我在 pyparsing 方面遇到了一个小问题,我似乎无法解决。我想编写一条规则来为我解析多行段落。最终目标是得到一个递归语法,它将解析如下内容:

Heading: awesome
    This is a paragraph and then
    a line break is inserted
    then we have more text

    but this is also a different line
    with more lines attached

    Other: cool
        This is another indented block
        possibly with more paragraphs

        This is another way to keep this up
        and write more things

    But then we can keep writing at the old level
    and get this

转换为 HTML 之类的东西:所以也许(当然使用解析树,我可以将其转换为我喜欢的任何格式)。

<Heading class="awesome">

    <p> This is a paragraph and then a line break is inserted and then we have more text </p>

    <p> but this is also a different line with more lines attached<p>

    <Other class="cool">
        <p> This is another indented block possibly with more paragraphs</p>
        <p> This is another way to keep this up and write more things</p>
    </Other>

    <p> But then we can keep writing at the old level and get this</p>
</Heading>

进展

我已经成功到达可以使用 pyparsing 解析标题行和缩进块的阶段。但我不能:

  • 将段落定义为应连接的多行
  • 允许段落缩进

一个例子

here开始,我可以将段落输出到一行,但似乎没有办法将其转换为解析树而不删除换行符。

我认为一个段落应该是:

words = ## I've defined words to allow a set of characters I need
lines = OneOrMore(words)
paragraph = OneOrMore(lines) + lineEnd

但这似乎对我不起作用。任何想法都会很棒:)

python parsing pyparsing
2个回答
3
投票

所以我设法解决了这个问题,对于任何将来偶然发现这个问题的人来说。您可以这样定义段落。尽管它肯定不理想,并且与我描述的语法不完全匹配。相关代码为:

line = OneOrMore(CharsNotIn('\n')) + Suppress(lineEnd)
emptyline = ~line
paragraph = OneOrMore(line) + emptyline
paragraph.setParseAction(join_lines)

其中

join_lines
定义为:

def join_lines(tokens):
    stripped = [t.strip() for t in tokens]
    joined = " ".join(stripped)
    return joined

如果这符合您的需求,这应该会为您指明正确的方向:)我希望有所帮助!

更好的空线

上面给出的空行定义肯定不理想,还可以大幅改进。我发现的最好的方法如下:

empty_line = Suppress(LineStart() + ZeroOrMore(" ") + LineEnd())
empty_line.setWhitespaceChars("")

这允许您使用空格填充空行,而不会破坏匹配。


0
投票

非常感谢。我在其他地方找不到它

© www.soinside.com 2019 - 2024. All rights reserved.