我一直在谷歌搜索从 xml 文件中删除孙子。但是,我还没有找到完美的解决方案。 这是我的案例:
<tree>
<category title="Item 1">item 1 text
<subitem title="subitem1">subitem1 text</subitem>
<subitem title="subitem2">subitem2 text</subitem>
</category>
<category title="Item 2">item 2 text
<subitem title="subitem21">subitem21 text</subitem>
<subitem title="subitem22">subitem22 text</subitem>
<subsubitem title="subsubitem211">subsubitem211 text</subsubitem>
</category>
</tree>
在某些情况下,我想删除
subitem
。在其他情况下,我想删除subsubitem
。我知道我可以在当前给定的内容中这样做:
import xml.etree.ElementTree as ET
root = ET.fromstring(given_content)
# case 1
for item in root.getiterator():
for subitem in item:
item.remove(subitem)
# case 2
for item in root.getiterator():
for subitem in item:
for subsubitem in subitem:
subitem.remove(subsubitem)
只有当我知道目标节点的深度时,我才能用这种风格编写。如果我只知道要删除的节点的标签名称,我应该如何实现?
伪代码:
import xml.etree.ElementTree as ET
for item in root.getiterator():
if item.tag == 'subsubitem' or item.tag == 'subitem':
# remove item
如果我这样做
root.remove(item)
,它肯定会返回错误,因为 item 不是
root
的直接子项。编辑: 我无法安装任何 3rd-party-lib,所以我必须用
xml
来解决这个问题。
xml
lib 上完成了这项工作。
def recursive_xml(root):
if root.getchildren() is not None:
for child in root.getchildren():
if child.tag == 'subitem' or child.tag == 'subsubitem':
root.remove(child)
else:
recursive_xml(child)
通过这样做,该函数将迭代 ET 中的每个节点并删除我的目标节点。
test_xml = r'''
<test>
<test1>
<test2>
<test3>
</test3>
<subsubitem>
</subsubitem>
</test2>
<subitem>
</subitem>
<nothing_matters>
</nothing_matters>
</test1>
</test>
'''
root = ET.fromstring(test_xml)
recursive_xml(root)
希望这可以帮助像我这样有限制要求的人....
subsubitem
或
subitem
的实例,无论其深度如何,请考虑以下示例(需要注意的是,它使用 lxml.etree
而不是上游 ElementTree):
import lxml.etree as etree
el = etree.fromstring('<root><item><subitem><subsubitem/></subitem></item></root>')
for child in el.xpath('.//subsubitem | .//subitem'):
child.getparent().remove(child)
https://docs.python.org/3/library/xml.etree.elementtree.html#modifying-an-xml-file
请注意,迭代时并发修改可能会导致 问题,就像迭代和修改 Python 列表或 听写。此脚本将在 3.10 上运行:
#!/usr/bin/python
import xml.etree.ElementTree as ET
def print_xmltree(root):
xmlstr = ET.tostring(root, encoding="utf-8", method="xml")
print(xmlstr.decode("utf-8"))
def recursive_xml(parent, depth):
#print(depth * " ", parent.findall('./'))
for child in parent.findall('./'):
if child.tag == 'subitem' or child.tag == 'subsubitem':
parent.remove(child)
else:
recursive_xml(child, depth + 1)
xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<tree>
<category title="Item 1">item 1 text
<subitem title="subitem1">subitem11 text</subitem>
<subitem title="subitem2">subitem12 text</subitem>
<sibetum title="subitem3">subitem13 text</sibetum>
<subsubitem title="subsubitem1">subsubitem211 text</subsubitem>
</category>
<category title="Item 2">item 2 text
<subitem title="subitem1">subitem21 text</subitem>
<subitem title="subitem2">subitem22 text</subitem>
<subsubitem title="subsubitem1">subsubitem211 text</subsubitem>
<sobsobitem title="subsubitem2">wrong tag</sobsobitem>
</category>
<category title="Item 3">item 3 text
</category>
</tree>"""
#root = ET.parse('test.xml').getroot() # from file
root = ET.fromstring(xml_data) # from variable
recursive_xml(root, 0)
print_xmltree(root)
# Note that sobsobitem was forced up in hierachy and that parent tag for subsubitem did not matter (sibetum).
wait = input("Press Enter to Exit.")
这将输出:
<tree>
<category title="Item 1">item 1 text
<sibetum title="subitem3">subitem13 text</sibetum>
</category>
<category title="Item 2">item 2 text
<sobsobitem title="subsubitem2">wrong tag</sobsobitem>
</category>
<category title="Item 3">item 3 text
</category>
</tree>