作为开发的初学者,我是否可能处于错误的主题中,如果是这样,抱歉。我一直在编写一个 Python 脚本,用于验证 XML 财务票据是否与 XML 布局兼容,使用 import xml.etree.ElementTree as ET 我可以执行脚本验证两个 XML 的标签是否相似,但是,现在,我想改进脚本来验证标签的顺序是否相同,但还有另一个问题:D,即使布局不完全相同,某些布局也会接受一些注释。例如,如果注释的第一个标签不在文件中,或者另一个标签在文件中。
import xml.etree.ElementTree as ET
# Function to calculate similarity between two sets
def calculate_similarity(set1, set2):
intersection = set1.intersection(set2)
smallest_set_size = min(len(set1), len(set2))
if smallest_set_size == 0:
return 0
similarity = len(intersection) / smallest_set_size
return similarity
# Function to validate a fiscal note against the layouts
def validate_fiscal_note(fiscal_note_xml, similarity_threshold=0.1):
fiscal_note_tree = ET.parse(fiscal_note_xml)
fiscal_note_root = fiscal_note_tree.getroot()
fiscal_note_tags = {child.tag for child in fiscal_note_root.iter()}
best_match_code = None
best_match_similarity = 0
best_match_code_list = []
for code, layout_tags in layouts_dict.items():
similarity = calculate_similarity(layout_tags, fiscal_note_tags)
best_match_code_list.append({'code': code, 'similarity': similarity})
sorted_list = sorted(best_match_code_list, key=lambda x: x['similarity'])
if not sorted_list:
return 'Any layout was finded'
else:
return sorted_list
# Parse the layouts XML
layouts_tree = ET.parse('layouts.xml')
layouts_root = layouts_tree.getroot()
# Build a dictionary of layouts
layouts_dict = {}
for layout in layouts_root:
layout_code = layout.attrib['code'] # Make sure this matches your XML structure; might need to adjust
tags = {child.tag for child in layout}
layouts_dict[layout_code] = tags
# Validate the fiscal note and print the result
result = validate_fiscal_note('Nota 2.xml')
print(result[-5:])
我该如何处理?验证标签的顺序和相似性,但在注释与布局不太不同时接受?
可以使用 pyxml2xpath 比较两个文件,以从两个文件中获取所有可能的 XPath 表达式。
还可以检查元素的顺序。
from xml2xpath import xml2xpath
tree, nsmap, xmap = xml2xpath.parse('/home/luis/tmp/S1.xml')
tree2, nsmap2, xmap2 = xml2xpath.parse('/home/luis/tmp/S2.xml')
# both contain same items
if xmap == xmap2:
print("Identical content", len(xmap.items()), len(xmap2.items()))
else:
print("Different")
# element order does not match
xmap2values= list(xmap2.items())
for i, x in enumerate(xmap.items()):
if x != xmap2values[i]:
print(f"Xpath order mismatch: {x} - {xmap2values[i]}")
else:
print(f"Xpath match: {x} - {xmap2values[i]}")
鉴于这些文件
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
<ele1>
<child1>1</child1>
<child2>2</child2>
</ele1>
<ele2>
<child1>1</child1>
<child2>2</child2>
</ele2>
</root>
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
<ele1>
<child1>1</child1>
<child2>2</child2>
</ele1>
<ele2>
<child2>2</child2>
<child1>1</child1>
</ele2>
</root>
结果
Identical content 7 7
Xpath match: ('/root', ['/root', 1.0, None]) - ('/root', ['/root', 1.0, None])
Xpath match: ('/root/ele1', ['/root/ele1', 1.0, None]) - ('/root/ele1', ['/root/ele1', 1.0, None])
Xpath match: ('/root/ele1/child1', ['/root/ele1/child1', 1.0, None]) - ('/root/ele1/child1', ['/root/ele1/child1', 1.0, None])
Xpath match: ('/root/ele1/child2', ['/root/ele1/child2', 1.0, None]) - ('/root/ele1/child2', ['/root/ele1/child2', 1.0, None])
Xpath match: ('/root/ele2', ['/root/ele2', 1.0, None]) - ('/root/ele2', ['/root/ele2', 1.0, None])
Xpath order mismatch: ('/root/ele2/child1', ['/root/ele2/child1', 1.0, None]) - ('/root/ele2/child2', ['/root/ele2/child2', 1.0, None])
Xpath order mismatch: ('/root/ele2/child2', ['/root/ele2/child2', 1.0, None]) - ('/root/ele2/child1', ['/root/ele2/child1', 1.0, None])
注意:我是
pyxml2xpath
的作者。很快就会出现在 PyPI 索引上,但现在需要从源代码构建。正如项目 README.md 中所述,这相当简单。