解析文件并创建数据结构

问题描述 投票:0回答:1

我们想要解析一个文件并创建一个稍后使用的某种数据结构(在Python中)。文件内容如下所示:

plan HELLO
   feature A 
       measure X :
          src = "Type ,Name"
       endmeasure //X

       measure Y :
        src = "Type ,Name"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaX

           measure AaY :
              src = "Type ,Name"
           endmeasure //AaY
           
           feature Aab
              .....
           endfeature // Aab
         
       endfeature //Aa
 
   endfeature // A
   
   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA

因此,有一个文件包含一个或多个计划,然后每个计划包含一个或多个功能,进一步每个功能包含一个包含信息(src,类型,名称)的度量,并且功能可以进一步包含更多功能。

我们需要解析文件并创建一个数据结构

                     plan (HELLO) 
            ------------------------------
             ↓                          ↓ 
          Feature A                  Feature B
  ----------------------------          ↓
   ↓           ↓             ↓           ........
Measure X    Measure Y    Feature Aa
                         ------------------------------
                            ↓           ↓             ↓ 
                       Measure AaX   Measure AaY   Feature Aab
                                                        ↓
                                                        .......

我正在尝试逐行解析文件并创建一个列表列表,其中包含计划 -> 功能 -> 度量,功能

def getplans(s):
    stack = [{}]
    stack_list = []
    
    for line in s.splitlines():
        if ": " in line:  # leaf
            temp_stack = {}
            key, value = line.split(": ", 1)
            key = key.replace("source","").replace("=","").replace("\"","").replace(";","")
            value = value.replace("\"","").replace(",","").replace(";","")
            temp_stack[key.strip()] = value.strip()
            stack_list.append(temp_stack)
            stack[-1]["MEASURED_VAL"] = stack_list
        elif line.strip()[:3] == "end":
            stack.pop()
            stack_list = []
        elif line.strip():
            collection, name, *_ = line.split()
            stack.append({})
            stack[-2].setdefault(collection, {})[name] = stack[-1] 
    return stack[0]
python list data-structures readlines fileparse
1个回答
0
投票

查看该文件,我尝试将其

plan
/
feature
/
measure
转换为标签,然后使用 HTML 解析器解析它,例如
beautifulsoup
(或者您可以尝试使用 YAML 和然后使用 Yaml 解析器):

text = """\
plan HELLO
   feature A
       measure X :
          src = "Type ,Name"
       endmeasure //X

       measure Y :
        src = "Type ,Name"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaX

           measure AaY :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaY

           feature Aab
              .....
           endfeature // Aab

       endfeature //Aa

   endfeature // A

   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA"""

import re

from bs4 import BeautifulSoup

data = re.sub(r"\b(plan|feature|measure)\s+([^:\s]+).*", r'<\g<1> name="\g<2>">', text)
data = re.sub(r"\b(?:end)(plan|feature|measure).*", r"</\g<1>>", data)
data = re.sub(r'src\s*=\s*((?:"[^"]+"\s*)+)', r"<src>\g<1></src>", data)

soup = BeautifulSoup(data, "html.parser")

for m in soup.select("measure"):
    # find parent PLAN:
    print("Plan:", m.find_parent("plan")["name"])
    # find feature PLAN:
    print("Parent Feature:", m.find_parent("feature")["name"])
    print("Name:", m["name"])
    for line in m.text.splitlines():
        data = list(map(str.strip, line.strip(' "').split(",")))
        if len(data) == 2:
            print(data)

转换后的文本将是:

<plan name="HELLO">
   <feature name="A">
       <measure name="X">
          <src>"Type ,Name"
       </src></measure>
                                                    
       <measure name="Y">
        <src>"Type ,Name"
       </src></measure>
                                                    
       <feature name="Aa">
           <measure name="AaX">
              <src>"Type ,Name"                
                    "Type ,Name2"
                    "Type ,Name3"
           </src></measure>

           <measure name="AaY">
              <src>"Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           </src></measure>

           <feature name="Aab">
              .....
           </feature>

       </feature>

   </feature>

   <feature name="B">
     ......
   </feature>
</plan>

<plan name="HOLA">
</plan>

并输出:

Plan: HELLO
Parent Feature: A
Name: X
['Type', 'Name']
Plan: HELLO
Parent Feature: A
Name: Y
['Type', 'Name']
Plan: HELLO
Parent Feature: Aa
Name: AaX
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
Plan: HELLO
Parent Feature: Aa
Name: AaY
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
© www.soinside.com 2019 - 2024. All rights reserved.