如何使用 python 比较 XML 布局与 XML 财政票据

Question

作为开发的初学者，我是否可能处于错误的主题中，如果是这样，抱歉。我一直在编写一个 Python 脚本，用于验证 XML 财务票据是否与 XML 布局兼容，使用 import xml.etree.ElementTree as ET 我可以执行脚本验证两个 XML 的标签是否相似，但是，现在，我想改进脚本来验证标签的顺序是否相同，但还有另一个问题：D，即使布局不完全相同，某些布局也会接受一些注释。例如，如果注释的第一个标签不在文件中，或者另一个标签在文件中。

import xml.etree.ElementTree as ET

# Function to calculate similarity between two sets
def calculate_similarity(set1, set2):
    intersection = set1.intersection(set2)
    smallest_set_size = min(len(set1), len(set2))
    if smallest_set_size == 0:
        return 0
    similarity = len(intersection) / smallest_set_size
    return similarity


# Function to validate a fiscal note against the layouts
def validate_fiscal_note(fiscal_note_xml, similarity_threshold=0.1):
    fiscal_note_tree = ET.parse(fiscal_note_xml)
    fiscal_note_root = fiscal_note_tree.getroot()
    
    fiscal_note_tags = {child.tag for child in fiscal_note_root.iter()}

    best_match_code = None
    best_match_similarity = 0
    best_match_code_list = []

    for code, layout_tags in layouts_dict.items():
        similarity = calculate_similarity(layout_tags, fiscal_note_tags)
        best_match_code_list.append({'code': code, 'similarity': similarity})
    
    sorted_list = sorted(best_match_code_list, key=lambda x: x['similarity'])

    if not sorted_list:
        return 'Any layout was finded'
    else:
        return sorted_list

# Parse the layouts XML
layouts_tree = ET.parse('layouts.xml')
layouts_root = layouts_tree.getroot()

# Build a dictionary of layouts
layouts_dict = {}
for layout in layouts_root:
    layout_code = layout.attrib['code']  # Make sure this matches your XML structure; might need to adjust
    tags = {child.tag for child in layout}
    layouts_dict[layout_code] = tags

# Validate the fiscal note and print the result
result = validate_fiscal_note('Nota 2.xml')
print(result[-5:])

我该如何处理？验证标签的顺序和相似性，但在注释与布局不太不同时接受？

Answer 1

可以使用 pyxml2xpath 比较两个文件，以从两个文件中获取所有可能的 XPath 表达式。
还可以检查元素的顺序。

from xml2xpath import xml2xpath

tree, nsmap, xmap = xml2xpath.parse('/home/luis/tmp/S1.xml')

tree2, nsmap2, xmap2 = xml2xpath.parse('/home/luis/tmp/S2.xml')
# both contain same items
if xmap == xmap2:
    print("Identical content", len(xmap.items()), len(xmap2.items()))
else:
    print("Different")

# element order does not match
xmap2values= list(xmap2.items())
for i, x in enumerate(xmap.items()):
    if x != xmap2values[i]:
        print(f"Xpath order mismatch: {x} - {xmap2values[i]}")
    else:
        print(f"Xpath match: {x} - {xmap2values[i]}")

鉴于这些文件

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
  <ele1>
    <child1>1</child1>
    <child2>2</child2>
  </ele1>
  <ele2>
    <child1>1</child1>
    <child2>2</child2>
  </ele2>
</root>

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
  <ele1>
    <child1>1</child1>
    <child2>2</child2>
  </ele1>
  <ele2>
    <child2>2</child2>
    <child1>1</child1>
  </ele2>
</root>

结果

Identical content 7 7

Xpath match: ('/root', ['/root', 1.0, None]) - ('/root', ['/root', 1.0, None])
Xpath match: ('/root/ele1', ['/root/ele1', 1.0, None]) - ('/root/ele1', ['/root/ele1', 1.0, None])
Xpath match: ('/root/ele1/child1', ['/root/ele1/child1', 1.0, None]) - ('/root/ele1/child1', ['/root/ele1/child1', 1.0, None])
Xpath match: ('/root/ele1/child2', ['/root/ele1/child2', 1.0, None]) - ('/root/ele1/child2', ['/root/ele1/child2', 1.0, None])
Xpath match: ('/root/ele2', ['/root/ele2', 1.0, None]) - ('/root/ele2', ['/root/ele2', 1.0, None])
Xpath order mismatch: ('/root/ele2/child1', ['/root/ele2/child1', 1.0, None]) - ('/root/ele2/child2', ['/root/ele2/child2', 1.0, None])
Xpath order mismatch: ('/root/ele2/child2', ['/root/ele2/child2', 1.0, None]) - ('/root/ele2/child1', ['/root/ele2/child1', 1.0, None])

注意：我是

pyxml2xpath

的作者。很快就会出现在 PyPI 索引上，但现在需要从源代码构建。正如项目 README.md 中所述，这相当简单。

如何使用 python 比较 XML 布局与 XML 财政票据

问题描述投票：0回答：1

1个回答

最新问题

如何使用 python 比较 XML 布局与 XML 财政票据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1