添加/替换 XML 标签的 Python 脚本

问题描述 投票:0回答:1

我有一个 Python 脚本,它应该在 XML 文档中查找现有标签,并将其替换为新的、更具描述性的标签。问题是,在我运行脚本后,它似乎只捕获我输入的文本字符串的每隔几个实例。我确信它的这种行为背后有一些原因,但我似乎无法弄清楚。

import xml.etree.ElementTree as ET
from lxml import etree

def replace_specific_line_tags(input_file, output_file, replacements):
    # Parse the XML file using lxml
    tree = etree.parse(input_file)
    root = tree.getroot()

    for target_text, replacement_tag in replacements:
        # Find all <line> tags with the specific target text under <content> and replace them with the new tag
        for line_tag in root.xpath('.//content/page/line[contains(., "{}")]'.format(target_text)):
            parent = line_tag.getparent()

            # Create the new tag with the desired tag name
            new_tag = etree.Element(replacement_tag)

            # Copy the attributes of the original <line> tag to the new tag
            for attr, value in line_tag.attrib.items():
                new_tag.set(attr, value)

            # Copy the text of the original <line> tag to the new tag
            new_tag.text = line_tag.text

            # Replace the original <line> tag with the new tag
            parent.replace(line_tag, new_tag)

    # Write the updated XML back to the file
    with open(output_file, 'wb') as f:
        tree.write(f, encoding='utf-8', xml_declaration=True)

if __name__ == '__main__':
    input_file_name = 'beforeTagEdits.xml'
    output_file_name = 'afterTagEdits.xml'
    
    # List of target texts and their corresponding replacement tags
    replacements = [
        ('The Washington Post', 'title'),

        # Add more target texts and their replacement tags as needed
    ]
    
    replace_specific_line_tags(input_file_name, output_file_name, replacements)

由于代码正在运行,只是不完全符合预期,我尝试更改一些文本字符串以匹配原始文件中已知的确切字符串,但这似乎并不能解决问题。以下是当前 XML 文档的示例:

<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
     </content>
</root>
python xml xml-parsing text-parsing
1个回答
0
投票

您可以通过根目录进行 iter() 并使用搜索到的文本重命名标签:

import xml.etree.ElementTree as ET

xml= """<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
          <tag>Spiegel</tag>
     </content>
</root>"""

root = ET.fromstring(xml)

pattern ={'title':'The Washington Post', 'title':'Spiegel'}

for k, v in pattern.items():
    for elem in root.iter():
        if elem.text == v:
            elem.tag = k
            
ET.dump(root)

输出:

<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
          <title>Spiegel</title>
     </content>
</root>
© www.soinside.com 2019 - 2024. All rights reserved.