我正在尝试编写一个 Python 脚本来解析 XML 文件并为 XML 中的每个表生成 CSV 文件。每个表都应包含其属性。此外,我想创建一个 CSV 文件来表示这些表之间的关系。
这是我的代码:
import xml.etree.ElementTree as ET
import csv
def extract_tables_and_attributes(xml_file):
parser = ET.XMLParser(encoding="windows-1252")
tree = ET.parse(xml_file, parser=parser)
root = tree.getroot()
tables = root.findall(".//{http://www.omg.org/spec/UML/20090901}Class")
table_data = []
for table in tables:
table_name = table.find("{http://www.omg.org/spec/UML/20090901}name").text
attributes = table.findall(".//{http://www.omg.org/spec/UML/20090901}Property")
table_attributes = []
for attr in attributes:
attr_name = attr.find("{http://www.omg.org/spec/UML/20090901}name").text
attr_type = attr.find("{http://www.omg.org/spec/UML/20090901}type").text
table_attributes.append([attr_name, attr_type])
table_data.append((table_name, table_attributes))
return table_data
def export_to_csv(table_data):
for table_name, attributes in table_data:
csv_file_name = f"{table_name}.csv"
with open(csv_file_name, 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(['Attribute Name', 'Attribute Type'])
for attr_name, attr_type in attributes:
csv_writer.writerow([attr_name, attr_type])
if __name__ == "__main__":
xml_file = r"Gemeentelijk Gegevensmodel XMI2.1.2.xml"
table_data = extract_tables_and_attributes(xml_file)
export_to_csv(table_data)
但是,我遇到了以下错误:
Traceback (most recent call last):
File "ggm to csv.py", line 39, in <module>
table_data = extract_tables_and_attributes(xml_file)
File "ggm to csv.py", line 6, in extract_tables_and_attributes
tree = ET.parse(xml_file, parser=parser)
File "xml\etree\ElementTree.py", line 1203, in parse
tree.parse(source, parser)
File "xml\etree\ElementTree.py", line 571, in parse
parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 71765, column 176
正在使用以下 XML,并且似乎不存在任何格式良好的问题。如果有人能帮助我解决这个问题,那就太好了。
https://github.com/Gegevensmodel%20XMI2.1.2.xml
我尝试使用从 Enterprise Architect 到 xml (uml) 的不同导出来运行具有多个版本的脚本,但我在同一行上不断遇到相同的问题。
编辑:
期望的结果看起来像这样(类型类的表(“PAND”)及其类型属性的属性。(这是一个表,但我期望许多 csv 都有自己的表):
属性名称、属性类型 Bruto inhoud pand、EAJava_N6 数据开始 geldigheid pand,EAJava_DATUM 基准面 geldigheid pand,EAJava_DATUM GeometriePunt,EAJava_GM_Point Hoogste bouwlaag pand,EAJava_N3 标识 BGTND、EAJava_NEN3610ID Ind 计划对象、EAJava_INDIC 指示几何图形,EAJava_INDIC 内胜几何 bovenaanzicht,EAJava_GM_Object 获胜几何 maaiveld,EAJava_GM_Object Laagste bouwlaag pand,EAJava_N3 标签 数字和uidingreeks,EAJava_C74E1553_32AE_4fd8_9796_00C6E1C51A11 Lod1 几何潘德,EAJava_GM_Object Lod2 几何潘德,EAJava_GM_Object Lod3 几何潘德,EAJava_GM_Object Oorspronkelijk bouwjaar pand,EAJava_JAAR Oppervlakte pand,EAJava_N6 Pandgeometrie bovenaanzicht,EAJava_GM_Surface Pandgeometrie maaild,EAJava_GM_MultiSurface Pandidentificatie,EAJava_AB8B30D0_FD1F_4c44_9396_BB05389EA20B Pandstatus,EAJava_E2CC5DFC_C264_4c21_8E47_F551958E1C17 相关 hoogteligging pand,EAJava_N2 状态 voortgang 鲍,EAJava_8C49F097_6D95_4406_B3B7_58AC102B6FD2
对我来说,不太清楚您正在搜索什么内容。你能举一个简单的例子来说明你的搜索模式是什么样的吗?
使用 lXML 您可以从 github 解析此文件:
from urllib.request import urlopen
from lxml import etree
import psutil
import time
time_start = time.time()
url = "https://raw.githubusercontent.com/Gemeente-Delft/Gemeentelijk-Gegevensmodel/master/"
file = "Gemeentelijk%20Gegevensmodel%20XMI2.1.2.xml"
fd = url+file
f = urlopen(fd)
for event, elem in etree.iterparse(f, events=['start-ns', 'end'], recover=True):
if event == "start-ns":
#print(elem[0])
pass
if event == "end" and elem.tag =="packagedElement" and elem.get('{http://schema.omg.org/spec/XMI/2.1}type')=='uml:Class':
for prob in elem.findall("./ownedAttribute"):
print(prob.get('name'))
print("RAM:")
print(psutil.Process().memory_info().rss / (1024 * 1024))
print("Time:")
print((time.time() - time_start))