xml.etree.ElementTree 未解析某一特定属性值

问题描述 投票:0回答:1

数据文件testfile.xml是这样的:

<?xml version="1.0" encoding="utf-8"?>
<body>
  <body.head>
    <hedline>
      <hl1 style="header">All the things we lost that summer</hl1>
      <hl2 style="standfirst">It was the promise of seals that sold Virginia on this mission.</hl2>
      <hl2 style="dropcap-large"><em class="dropcap">W</em>e are always calling each other names.</hl2>
    </hedline>
  </body.head>
</body>

解析该文件的脚本是这样的:

import xml.etree.ElementTree as ET
tree = ET.parse('testfile.xml')
root = tree.getroot()
if root.find('body.head') is not None:
    if root.find('body.head').find('hedline') is not None:
        for child1 in root.find('body.head').find('hedline'):
            print("Tag    level 1:" + child1.tag)
            print("Attrib level 1:" + str(child1.attrib))
            print("Text   level 1:" + str(child1.text) + "\n")
            for child2 in child1:
                print("Tag    level 2:" + child2.tag)
                print("Attrib level 2:" + str(child2.attrib))
                print("Text   level 2:" + str(child2.text))

这就是结果:

Tag    level 1:hl1
Attrib level 1:{'style': 'header'}
Text   level 1:All the things we lost that summer

Tag    level 1:hl2
Attrib level 1:{'style': 'standfirst'}
Text   level 1:It was the promise of seals that sold Virginia on this mission.

Tag    level 1:hl2
Attrib level 1:{'style': 'dropcap-large'}
Text   level 1:None  <-- THIS IS THE PROBLEM

Tag    level 2:em
Attrib level 2:{'class': 'dropcap'}
Text   level 2:W

我希望报告行“文本级别 1:”报告值“e 总是互相称呼对方的名字”。来自数据文件,但它无法解析它,因此它最终为 None。 你能正确解析它吗? 这是 Windows 上的 Python 3.12。

谢谢,马丁

python-3.x xml xml-parsing
1个回答
0
投票

那是因为在 ElementTree(和 lxml)中,这是

.tail
元素的
em
.text
属性仅包含第一个子文本节点。

请参阅

tail
此处了解更多信息。

© www.soinside.com 2019 - 2024. All rights reserved.