我正在用 Python 解析一个 XML 字符串,我正在寻找一个 XPath 表达式来只检索明确定义任何默认命名空间(xmlns,没有前缀)的元素。
我正在处理这个例子:
from lxml import etree
xml = '''
<TELCAL_DATAMANAGER_BDNT_1
xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0 BulkDataNTReceiver.xsd">
<ReceiverStream Name="InterferometricStream1" participantPerStream="true">
<ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
<ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
<ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
</ReceiverStream>
<ReceiverStream Name="InterferometricStream2" participantPerStream="true">
<ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
<ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
<ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
</ReceiverStream>
<ReceiverStream Name="InterferometricStream3" participantPerStream="true">
<ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"></ReceiverFlow>
<ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"></ReceiverFlow>
<ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"></ReceiverFlow>
</ReceiverStream>
</TELCAL_DATAMANAGER_BDNT_1>
'''
root = etree.fromstring(xml)
elements = root.xpath("//*[namespace-uri()!='' and not(local-name()='*')]")
for element in elements:
print(element)
我得到这个结果:
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f8100613640>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaaf00>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaaf40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaaec0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaac80>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaad40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaac40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaad80>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaacc0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverStream at 0x7f80ffaaad00>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaadc0>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaab40>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}ReceiverFlow at 0x7f80ffaaae00>
但我只需要
TELCAL_DATAMANAGER_BDNT_1
元素,这是唯一一个带有xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0"
的元素
预期结果:
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f8100613640>
是否有一个 XPath 表达式来实现这个?
此 XPath 将返回声明命名空间的节点,而不是继承命名空间的节点
//*[namespace-uri()!='' and namespace-uri() != namespace-uri(parent::*) and not(contains(name(), ':'))]
not(contains(name(), ':'))
会过滤掉前缀为的命名空间
from lxml import etree
xml = '''
<root>
<TELCAL_DATAMANAGER_BDNT_1 xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0 BulkDataNTReceiver.xsd">
<ReceiverStream Name="InterferometricStream1" participantPerStream="true">
<ReceiverFlow Name="FullSpectralFlow" cbReceiveAvgProcessTimeoutSec="0.0018" multicastAddress="225.3.2.100"/>
<ReceiverFlow Name="ChannelAveragesFlow" multicastAddress="225.3.2.200"/>
<ReceiverFlow Name="WVRFlow" multicastAddress="225.3.2.30"/>
</ReceiverStream>
<ns1:ele xmlns:ns1="example.com">
<ns1:el2>a</ns1:el2>
<some xmlns="urn:schemas-cosylab-com:BulkDataNTReceiver:1.0">
<other>o</other>
</some>
</ns1:ele>
</TELCAL_DATAMANAGER_BDNT_1>
</root>
'''
root = etree.fromstring(xml)
elements = root.xpath("//*[namespace-uri()!='' and namespace-uri() != namespace-uri(parent::*) and not(contains(name(), ':'))]")
for element in elements:
print(element)
结果
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}TELCAL_DATAMANAGER_BDNT_1 at 0x7f0bc0a74748>
<Element {urn:schemas-cosylab-com:BulkDataNTReceiver:1.0}some at 0x7f0bc0a74708>
如果删除
namespace-uri() != namespace-uri(parent::*)
,将返回所有具有默认命名空间的节点。
注意:找到的节点与它们的子节点一起返回,这些子节点可能包含也可能不包含名称空间声明。