我们想要解析一个文件并创建一个稍后使用的某种数据结构(在Python中)。文件内容如下所示:
plan HELLO
feature A
measure X :
src = "Type ,Name"
endmeasure //X
measure Y :
src = "Type ,Name"
endmeasure //Y
feature Aa
measure AaX :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaX
measure AaY :
src = "Type ,Name"
endmeasure //AaY
feature Aab
.....
endfeature // Aab
endfeature //Aa
endfeature // A
feature B
......
endfeature //B
endplan
plan HOLA
endplan //HOLA
因此,有一个文件包含一个或多个计划,然后每个计划包含一个或多个功能,进一步每个功能包含一个包含信息(src,类型,名称)的度量,并且功能可以进一步包含更多功能。
我们需要解析文件并创建一个数据结构
plan (HELLO)
------------------------------
↓ ↓
Feature A Feature B
---------------------------- ↓
↓ ↓ ↓ ........
Measure X Measure Y Feature Aa
------------------------------
↓ ↓ ↓
Measure AaX Measure AaY Feature Aab
↓
.......
我正在尝试逐行解析文件并创建一个列表列表,其中包含计划 -> 功能 -> 度量,功能
def getplans(s):
stack = [{}]
stack_list = []
for line in s.splitlines():
if ": " in line: # leaf
temp_stack = {}
key, value = line.split(": ", 1)
key = key.replace("source","").replace("=","").replace("\"","").replace(";","")
value = value.replace("\"","").replace(",","").replace(";","")
temp_stack[key.strip()] = value.strip()
stack_list.append(temp_stack)
stack[-1]["MEASURED_VAL"] = stack_list
elif line.strip()[:3] == "end":
stack.pop()
stack_list = []
elif line.strip():
collection, name, *_ = line.split()
stack.append({})
stack[-2].setdefault(collection, {})[name] = stack[-1]
return stack[0]
查看该文件,我尝试将其
plan
/feature
/measure
转换为标签,然后使用 HTML 解析器解析它,例如 beautifulsoup
(或者您可以尝试使用 YAML 和然后使用 Yaml 解析器):
text = """\
plan HELLO
feature A
measure X :
src = "Type ,Name"
endmeasure //X
measure Y :
src = "Type ,Name"
endmeasure //Y
feature Aa
measure AaX :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaX
measure AaY :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaY
feature Aab
.....
endfeature // Aab
endfeature //Aa
endfeature // A
feature B
......
endfeature //B
endplan
plan HOLA
endplan //HOLA"""
import re
from bs4 import BeautifulSoup
data = re.sub(r"\b(plan|feature|measure)\s+([^:\s]+).*", r'<\g<1> name="\g<2>">', text)
data = re.sub(r"\b(?:end)(plan|feature|measure).*", r"</\g<1>>", data)
data = re.sub(r'src\s*=\s*((?:"[^"]+"\s*)+)', r"<src>\g<1></src>", data)
soup = BeautifulSoup(data, "html.parser")
for m in soup.select("measure"):
# find parent PLAN:
print("Plan:", m.find_parent("plan")["name"])
# find feature PLAN:
print("Parent Feature:", m.find_parent("feature")["name"])
print("Name:", m["name"])
for line in m.text.splitlines():
data = list(map(str.strip, line.strip(' "').split(",")))
if len(data) == 2:
print(data)
转换后的文本将是:
<plan name="HELLO">
<feature name="A">
<measure name="X">
<src>"Type ,Name"
</src></measure>
<measure name="Y">
<src>"Type ,Name"
</src></measure>
<feature name="Aa">
<measure name="AaX">
<src>"Type ,Name"
"Type ,Name2"
"Type ,Name3"
</src></measure>
<measure name="AaY">
<src>"Type ,Name"
"Type ,Name2"
"Type ,Name3"
</src></measure>
<feature name="Aab">
.....
</feature>
</feature>
</feature>
<feature name="B">
......
</feature>
</plan>
<plan name="HOLA">
</plan>
并输出:
Plan: HELLO
Parent Feature: A
Name: X
['Type', 'Name']
Plan: HELLO
Parent Feature: A
Name: Y
['Type', 'Name']
Plan: HELLO
Parent Feature: Aa
Name: AaX
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
Plan: HELLO
Parent Feature: Aa
Name: AaY
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']