如何让python textx解析Island Grammars

问题描述 投票:0回答:2

我正在尝试解析网络设备配置文件,虽然我会浏览整个文件,但我不想包含文件的所有条款,而只想包含其中的一个子集。

所以假设配置文件如下:

bootfile abc.bin
motd "this is a
message for everyone to follow"
.
.
.
group 1
group 2
.
.
.
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
.
.
.
interface a
  description this is interface a
  vlan 33 

interface b
  description craigs list
  no shut
  vlan 33
  no ip address
.
.
.

我只是想捕获接口线(按原样)以及描述和 VLAN 线 - 其他所有内容都将被忽略。界面内的内容将分为 2 个属性:有效和无效

所以语法看起来像这样:

Config[noskipsp]:
  interfaces *= !InterfaceDefinition | InterfaceDefinition
;

InterfaceDefinition:
  interface = intf
  valids *= valid
  invalids *= invalid
;

intf: /^interface .*\n/;
cmds: /^ (description|vlan) .*\n/;
invalid: /^(?!(interface|description|vlan) .*\n;

目标是获得一个Python接口数组,其中每个接口都有2个属性:valids和invalids,每个都是数组。有效数组将包含描述或 VLAN 条目,无效数组将包含其他所有内容。

有几个我似乎无法解决的挑战: 1-如何忽略所有其他不是接口定义的内容? 2- 如何确保所有接口最终都作为一个接口而不是另一个接口的 invalids 属性?

不幸的是 - 解析文本时的语法不会失败,但我对解析器如何遍历文本的理解似乎有问题,因为它在尝试读取通过“interface .*”部分的任何文本时会抱怨。

此外,目前我正在使用仅包含接口定义的文件进行显式测试,但目标是处理完整文件,仅针对接口,因此需要从语法方面丢弃所有其他内容。


更新进度

最初 - 在 Igor 的第一个答案之后,我能够创建一个语法,可以成功地完全解析我拥有的虚拟配置文件,尽管结果不是所需的 - 可能是由于我的无知。根据 Igor 的第二次更新答案,我决定重构原始语法并简化它以尝试匹配我的示例虚拟配置。

我在模型级别的目标是能够拥有一个类似于以下伪结构的对象

class network_config:

    def __init__(self):
        self.invalid = [] # Entries that do not match the higher level
                       # hierarchy objects
        self.interfaces = []  # Inteface definitions

class Interface:

     def __init__(self):
        self.name = ""
        self.vlans = []
        self.description = ""
        self.junk = []  # This are all other configurations
                        # within the interface that are not
                        # matching neither vlans or description

虚拟配置文件(要解析的数据)如下所示:

junk
irrelevant configuration line
interface GigabitEthernet0/0/0
   description 3 and again
   nonsense
   vlan 33
   this and that
   vlan 16
interface GigabitEthernet0/0/2
   something for the nic
   vlan 22
   description here and there
! a simple comment
intermiediate
more nonrelated information

interface GigabitEthernet0/0/3
   this is junk too
   vlan 99
don't forget this
interface GigabitEthernet0/0/1
interface GigabitEthernet0/0/9
nothing of interest
silly stuff
some final data

我创建的新textx语法如下:

Config:
    (
        invalid*=Junk
        | interfaces*=Interface
    )*
;

Junk:
   /(?s)(?!((interface)|(vlan)|(description)).)[^\n]*\n/  // <- consume that is not a 'vlan', 'description', nor 'interface'
;

Interface:
   'interface' name=/[^\n]+\n/
   ( description+=Description
   | vlans*=Vlan
   | invalids*=InterfaceJunk
   )*
;

Description:
    /description[^\n]+\n/
;

Vlan:
    /vlan[^\n]+\n/
;

InterfaceJunk:
    /(?!((interface)|(vlan)|(description))).[^\n]*\n/  // <- consume everything that is not an interface, vlan, or description
;

令我惊讶的是,当我尝试对抗它时 - 我注意到它正在进入无限循环。我还注意到改变根规则从

Config:
    (
        invalid*=Junk
        | interfaces*=Interface
    )*
;

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule OrderedChoice in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
      <<+ Matched rule OrderedChoice in Config at position 35 => tion line *interface
      >> Matching rule OrderedChoice in Config at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 35 => tion line *interface
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
      <<- Not matched rule OrderedChoice in Config at position 35 => tion line *interface
      >> Matching rule OrderedChoice in Config at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 35 => tion line *interface
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface

Config:
    (
        invalid*=Junk interfaces*=Interface
    )*
;

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule Sequence in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 35 => tion line *interface
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 35 => tion line *interface
               ?? Try match rule StrMatch(interface) in Interface at position 35 => tion line *interface
               ++ Match 'interface' at 35 => 'tion line *interface* '
               >> Matching rule __asgn_plain=Sequence[name] in Interface at position 44 =>  interface* GigabitEt
                  ?? Try match rule RegExMatch([^\n]+\n) in __asgn_plain at position 45 => interface *GigabitEth
                  ++ Match 'GigabitEthernet0/0/0
' at 45 => 'interface *GigabitEthernet0/0/0 *'
               <<+ Matched rule __asgn_plain=Sequence[name] in __asgn_plain at position 66 => rnet0/0/0 *   descrip
               >> Matching rule ZeroOrMore in Interface at position 66 => rnet0/0/0 *   descrip
                  >> Matching rule OrderedChoice in Interface at position 66 => rnet0/0/0 *   descrip
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 66 => rnet0/0/0 *   descrip
                        ?? Try match rule Description=RegExMatch(description[^\n]+\n) in __asgn_oneormore at position 69 => t0/0/0    *descriptio
                        ++ Match 'description 3 and again
' at 69 => 't0/0/0    *description 3 and again *'
                        ?? Try match rule Description=RegExMatch(description[^\n]+\n) in __asgn_oneormore at position 96 =>  again    *nonsense
                        -- NoMatch at 96

给出了 2 个不同的结果,尽管它们都不是我所希望的 - 在第一种格式中,解析器最终会卡住循环,不断寻找无效模式(即;垃圾),而在第二种格式中,解析器将能够通过寻找无效者,并至少找到第一个

interface GigabitEthernet0/0/0
,尽管一旦进入界面,它会再次陷入无限循环。

我的印象是,执行 ( attr1*=pattern1 | attr2*=pattern2 | attr3*=pattern3) 意味着它会尝试每个模式,但只要pattern1不存在,它似乎就会停留在pattern1上被发现。 (有序选择描述了它) - 我的语法中一定有某些东西导致了这种情况。

然后我继续将语法解析器更新为以下内容 - 这似乎让我更进一步并摆脱了无限循环,但不知何故,在查看调试信息时 - 似乎当它用尽规则内的条件时,它回溯它所在的文本...

Config:
    (
        (
            invalid*=Junk
            | interfaces*=Interface
        )#
    )*
;

Junk:
   /(?!((interface)|(vlan)|(description)).)[^\n]*\n/  // <- consume everything till the 'vlan', interface, or description
;

Interface[noskipws]:
   'interface'/\s*/ name=/[^\n]+\n/
   (
       ( description+=Description
       | vlans*=Vlan
       | invalids*=InterfaceJunk
       )#  // How does this get out from here - how does textx know to get out (if all 3 possibilities are not matched?)
   )*
;

Description[noskipws]:
    /\s+description[^\n]+\n/
;

Vlan[noskipws]:
    /\s+vlan[^\n]+\n/
;

InterfaceJunk[noskipws]:
    /(?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n/  // <- consume everything till the 'vlan', interface, or description
;

日志现在看起来像:

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule UnorderedGroup in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 35 => tion line *interface
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 35 => tion line *interface
               ?? Try match rule StrMatch(interface) in Interface at position 35 => tion line *interface
               ++ Match 'interface' at 35 => 'tion line *interface* '
               ?? Try match rule RegExMatch(\s*) in Interface at position 44 =>  interface* GigabitEt
               ++ Match ' ' at 44 => ' interface* *GigabitEt'
               >> Matching rule __asgn_plain=Sequence[name] in Interface at position 45 => interface *GigabitEth
                  ?? Try match rule RegExMatch([^\n]+\n) in __asgn_plain at position 45 => interface *GigabitEth
                  ++ Match 'GigabitEthernet0/0/0
' at 45 => 'interface *GigabitEthernet0/0/0 *'
               <<+ Matched rule __asgn_plain=Sequence[name] in __asgn_plain at position 66 => rnet0/0/0 *   nonsens
               >> Matching rule ZeroOrMore in Interface at position 66 => rnet0/0/0 *   nonsens
                  >> Matching rule UnorderedGroup in Interface at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 66 => rnet0/0/0 *   nonsens
                           -- NoMatch at 66
                        <<- Not matched rule Description=Sequence in Description at position 66 => rnet0/0/0 *   nonsens
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 66 => rnet0/0/0 *   nonsens
                           -- NoMatch at 66
                        <<- Not matched rule Vlan=Sequence in Vlan at position 66 => rnet0/0/0 *   nonsens
                     <<- Not matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[invalids] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 66 => rnet0/0/0 *   nonsens
                           ++ Match '   nonsense
' at 66 => 'rnet0/0/0 *   nonsense *'
                        <<+ Matched rule InterfaceJunk=Sequence in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                           -- NoMatch at 78
                        <<- Not matched rule InterfaceJunk=Sequence in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalids] in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 78 =>  nonsense *   vlan 33
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 78 =>  nonsense *   vlan 33
                           -- NoMatch at 78
                        <<- Not matched rule Description=Sequence in Description at position 78 =>  nonsense *   vlan 33
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 78 =>  nonsense *   vlan 33
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 78 =>  nonsense *   vlan 33
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 78 =>  nonsense *   vlan 33
                           ++ Match '   vlan 33
' at 78 => ' nonsense *   vlan 33 *'
                        <<+ Matched rule Vlan=Sequence in Vlan at position 89 =>   vlan 33 *   descrip
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 89 =>   vlan 33 *   descrip
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 89 =>   vlan 33 *   descrip
                           -- NoMatch at 89
                        <<- Not matched rule Vlan=Sequence in Vlan at position 89 =>   vlan 33 *   descrip
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 89 =>   vlan 33 *   descrip
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 89 =>   vlan 33 *   descrip
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 89 =>   vlan 33 *   descrip
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 89 =>   vlan 33 *   descrip
                           ++ Match '   description 3 and again
' at 89 => '  vlan 33 *   description 3 and again *'
                        <<+ Matched rule Description=Sequence in Description at position 116 => and again *   this an
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Description=Sequence in Description at position 116 => and again *   this an
                     <<+ Matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 116 => and again *   this an
                  <<+ Matched rule UnorderedGroup in Interface at position 116 => and again *   this an
                  >> Matching rule UnorderedGroup in Interface at position 116 => and again *   this an
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 116 => and again *   this an
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Description=Sequence in Description at position 116 => and again *   this an
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 116 => and again *   this an
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 116 => and again *   this an
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Vlan=Sequence in Vlan at position 116 => and again *   this an
                     <<- Not matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 116 => and again *   this an
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[invalids] in Interface at position 116 => and again *   this an
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 116 => and again *   this an
                           ++ Match '   this and that
' at 116 => 'and again *   this and that *'
                        <<+ Matched rule InterfaceJunk=Sequence in InterfaceJunk at position 133 =>  and that *   vlan 16
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 133 =>  and that *   vlan 16
                           -- NoMatch at 133
                        <<- Not matched rule InterfaceJunk=Sequence in InterfaceJunk at position 133 =>  and that *   vlan 16
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalids] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 133 =>  and that *   vlan 16
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 133 =>  and that *   vlan 16
                           -- NoMatch at 133
                        <<- Not matched rule Description=Sequence in Description at position 133 =>  and that *   vlan 16
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 133 =>  and that *   vlan 16
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 133 =>  and that *   vlan 16
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 133 =>  and that *   vlan 16
                           ++ Match '   vlan 16
' at 133 => ' and that *   vlan 16 *'
                        <<+ Matched rule Vlan=Sequence in Vlan at position 144 =>   vlan 16 *interface
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 144 =>   vlan 16 *interface
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 144 =>   vlan 16 *interface
                           -- NoMatch at 144
                        <<- Not matched rule Vlan=Sequence in Vlan at position 144 =>   vlan 16 *interface
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 144 =>   vlan 16 *interface
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 144 =>   vlan 16 *interface
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 144 =>   vlan 16 *interface
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 144 =>   vlan 16 *interface
                           -- NoMatch at 144
                        <<- Not matched rule Description=Sequence in Description at position 144 =>   vlan 16 *interface
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 144 =>   vlan 16 *interface
                  <<- Not matched rule UnorderedGroup in Interface at position 116 => and again *   this an
               <<+ Matched rule ZeroOrMore in Interface at position 116 => and again *   this an
            <<+ Matched rule Interface=Sequence in Interface at position 116 => and again *   this an
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 116 => and again *   this an
               ?? Try match rule StrMatch(interface) in Interface at position 116 => and again *   this an
               -- No match 'interface' at 116 => 'and again *   this a*n'
            <<- Not matched rule Interface=Sequence in Interface at position 116 => and again *   this an
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 116 => and again *   this an
      <<+ Matched rule UnorderedGroup in Config at position 116 => and again *   this an
      >> Matching rule UnorderedGroup in Config at position 116 => and again *   this an
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 116 => and again *   this an
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 119 =>  again    *this and t
            ++ Match 'this and that
' at 119 => ' again    *this and that *'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 136 => d that    *vlan 16 in
            -- NoMatch at 136
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 133 =>  and that *   vlan 16
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
               ?? Try match rule StrMatch(interface) in Interface at position 133 =>  and that *   vlan 16
               -- No match 'interface' at 133 => ' and that *   vlan 1*6'
            <<- Not matched rule Interface=Sequence in Interface at position 133 =>  and that *   vlan 16
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
      <<+ Matched rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
      >> Matching rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 133 =>  and that *   vlan 16
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 136 => d that    *vlan 16 in
            -- NoMatch at 136
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 133 =>  and that *   vlan 16
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
               ?? Try match rule StrMatch(interface) in Interface at position 133 =>  and that *   vlan 16
               -- No match 'interface' at 133 => ' and that *   vlan 1*6'
            <<- Not matched rule Interface=Sequence in Interface at position 133 =>  and that *   vlan 16
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
      <<- Not matched rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
   <<+ Matched rule Config=ZeroOrMore in Config at position 133 =>  and that *   vlan 16
   ?? Try match rule EOF in Model at position 136 => d that    *vlan 16 in
   !! EOF not matched.
<<- Not matched rule Model=Sequence in Model at position 0 => *junk irrel
Traceback (most recent call last):
...

关于我的误解在哪里有什么提示吗?


修订后的工作语法

经过 Igor 的提示(谢谢),我已经能够制作最终的 Grammar.tx,它成功解析并在 textx 生成的模型中产生所需的对象结果(参见最终答案)

python parsing configuration textx
2个回答
0
投票

你所做的通常被称为Island Grammars。您可以在 textX 中轻松做到这一点,并且可以轻松提取界面的实际结构。这是一种可能的解决方案:

from textx import metamodel_from_str

mmstr = r'''
Config:
    (
        /(?s)((?!interface).)*/   // <- consume everything till the keyword 'interface'
        interfaces=Interface
    )*
    /(?s).*/   // <- consume all content after the last interface
;

Interface:
    'interface' name=ID
    'description' description=/[^\n]*/
    /((?!vlan).)*/  // <- consume everything till the 'vlan'
    'vlan' vlan=INT;
'''

model_str = r'''
bootfile abc.bin
motd "this is a
message for everyone to follow"
.
.
.
group 1
group 2
.
.
.
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
.
.
.
interface a
  description this is interface a
  vlan 33

interface b
  description craigs list
  no shut
  vlan 33
  no ip address
'''

mm = metamodel_from_str(mmstr)

model = mm.model_from_str(model_str)

for i in model.interfaces:
    print(i.name, i.description, i.vlan)

您始终可以将 textX 置于调试模式(通过传递

debug=True
)来查看解析过程并验证您的假设。

添加 - 任意顺序的界面元素

Interface:
    'interface' name=ID
    ('description' description=/[^\n]*/
     | 'vlan' vlan=INT
     | /(?s)(?!interface)./    // <- Consume a single char if not description or vlan or interface
    )*        // <- and then repeat

这适用于任何位置的

description
vlan
,但请注意,生成的 Python 对象上的属性现在将列出,因为此语法支持每个接口多个
vlan
description
。您可以通过为
Interface
注册对象处理器并提取列表中的唯一元素来解决此问题。比如:

def interface_processor(obj):
    if obj.description:
        obj.description = obj.description[0]
    if obj.vlan:
        obj.vlan = obj.vlan[0]

不要忘记注册这个对象处理器。有关详细信息,请参阅 textX 文档。


0
投票
Config:
    (
            Junk
            | interfaces=Interface
    )*
;

Junk:
   /(?!((interface)|(vlan)|(description)).)[^\n]*\n/
;

Interface[noskipws]:
   'interface'/\s*/ name=/[^\n]+\n/
   (
       description+=Description
       | vlans=Vlan
       | invalids=InterfaceJunk
   )*
;

Description[noskipws]:
    /\s+description[^\n]+\n/
;

Vlan[noskipws]:
    /\s+vlan[^\n]+\n/
;

InterfaceJunk[noskipws]:
    /(?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n/
;

有以下更改:

© www.soinside.com 2019 - 2024. All rights reserved.