在lxml中按名称空间查找

Question

我有一个xml文件，其中包含类似于gnc:account的元素（这是gnucash帐户文件）。我想找到所有具有该名称的元素。

但是，如果我这样做；

for account in tree.iter('gnc:account'):
    print(account)

我什么也没打印。相反，我编写了这段荒谬的代码：

def n(string):
    pair = string.split(':')
    return '{{{}}}{}'.format(root.nsmap[pair[0]], pair[1])

现在我可以这样做：

for account in tree.iter(n('gnc:account')):
    print(account)

有效。

这个问题有非荒谬的解决方案吗？我不愿意写出完整的URI。

Answer 1

我认为您现在拥有的东西肯定太老套了。

使用XPath解决方案

您可以使用XPath，并注册此名称空间URI和前缀：

>>> from io import StringIO
>>> s = """<root xmlns:gnc="www.gnc.com">
... <gnc:account>1</gnc:account>
... <gnc:account>2</gnc:account>
... </root>"""
>>> tree = etree.parse(StringIO(s))

# show that without the prefix, there are no results
>>> tree.xpath("//account")
[]

# with an unregistered prefix, throws an error
>>> tree.xpath("//gnc:account")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 2287, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix

# correct way of registering the namespace
>>> tree.xpath("//gnc:account", namespaces={'gnc': 'www.gnc.com'})
[<Element {www.gnc.com}account at 0x112bdd808>, <Element {www.gnc.com}account at 0x112bdd948>]

与tree.iter()勾画

例如，如果您仍然想以此方式呼叫iter()，则需要遵循lxml's advice on using namespaces with iter：

>>> for account in tree.iter('{www.gnc.com}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

并且如果您绝对要避免写出名称空间URI或注册名称空间（我认为这不是有效的参数，那是非常容易且更清楚的，则也可以使用]]

>>> for account in tree.iter('{*}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

Answer 2

这个问题有非荒谬的解决方案吗？我不愿意写出完整的URI。

在lxml中按名称空间查找

问题描述投票：0回答：2

2个回答

最新问题

在lxml中按名称空间查找

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2