仅当 XML 元素使用 Python 和 XPath 显式定义任何默认命名空间时才获取它们

问题描述 投票:0回答:1

我正在用 Python 解析一个 XML 字符串,我正在寻找一个 XPath 表达式来只检索明确定义任何默认命名空间(xmlns,没有前缀)的元素。

我正在处理这个例子:

from lxml import etree

xml = '''
<TELCAL_DATAMANAGER_BDNT_1
    xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0 BulkDataNTReceiver.xsd">
    <ReceiverStream Name="InterferometricStream1" participantPerStream="true">
        <ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
        <ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
        <ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
    </ReceiverStream>
    <ReceiverStream Name="InterferometricStream2" participantPerStream="true">
        <ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
        <ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
        <ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
    </ReceiverStream>
    <ReceiverStream Name="InterferometricStream3" participantPerStream="true">
        <ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
        <ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
        <ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
    </ReceiverStream>
</TELCAL_DATAMANAGER_BDNT_1>
'''

root = etree.fromstring(xml)

elements = root.xpath("//*[namespace-uri()!='' and not(local-name()='*')]")

for element in elements:
    print(element)

我得到这个结果:

<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f8100613640>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaaf00>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaaf40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaaec0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaac80>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaad40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaac40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaad80>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaacc0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaad00>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaadc0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaab40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaae00>

但我只需要

TELCAL_DATAMANAGER_BDNT_1
元素,这是唯一一个带有
xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0"

的元素

预期结果:

<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f8100613640>

是否有一个 XPath 表达式来实现这个?

python xml xpath lxml xml-namespaces
1个回答
0
投票

此 XPath 将返回声明命名空间的节点,而不是继承命名空间的节点

//*[namespace-uri()!='' and namespace-uri() != namespace-uri(parent::*) and not(contains(name(), ':'))]

not(contains(name(), ':'))
会过滤掉前缀为

的命名空间
from lxml import etree

xml = '''
<root>
  <TELCAL_DATAMANAGER_BDNT_1 xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0 BulkDataNTReceiver.xsd">
    <ReceiverStream Name="InterferometricStream1" participantPerStream="true">
      <ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"/>
      <ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"/>
      <ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"/>
    </ReceiverStream>
    <ns1:ele xmlns:ns1="example.com">
      <ns1:el2>a</ns1:el2>
      <some xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0">
        <other>o</other>
      </some>
    </ns1:ele>
  </TELCAL_DATAMANAGER_BDNT_1>
</root>
'''

root = etree.fromstring(xml)

elements = root.xpath("//*[namespace-uri()!='' and namespace-uri() != namespace-uri(parent::*) and not(contains(name(), ':'))]")

for element in elements:
    print(element)

结果

<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f0bc0a74748>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}some at 0x7f0bc0a74708>

如果删除

namespace-uri() != namespace-uri(parent::*)
,将返回所有具有默认命名空间的节点。

注意:找到的节点与它们的子节点一起返回,这些子节点可能包含也可能不包含名称空间声明。

© www.soinside.com 2019 - 2024. All rights reserved.