提取没有 xml:lang 属性的元素

问题描述 投票:0回答:2
python xml lxml
2个回答
2
投票

您可以检查

lang
是否不在标签的属性中:

from lxml import etree

xml_string = """
<components version="1.0.0">
    <component type="foo">
        <sample>Foo</sample>
        <sample lang="a">abc</sample>
        <sample lang="b">efj</sample>
    </component>
</components>
"""

root = etree.fromstring(xml_string)

for sample in root.findall("component/sample"):
    if "lang" not in sample.attrib:
        print(sample.text)

打印:

Foo

编辑:如果您有命名空间

lang:
,您可以尝试:

from lxml import etree

xml_string = """
<components version="1.0.0">
    <component type="foo">
        <sample>Foo</sample>
        <sample xml:lang="a">abc</sample>
        <sample xml:lang="b">efj</sample>
    </component>
</components>
"""

root = etree.fromstring(xml_string)

for sample in root.findall("component/sample"):
    # use http://www.w3.org/XML/1998/namespace here
    # or other Namespace URI found in your document
    lang = sample.attrib.get(r"{http://www.w3.org/XML/1998/namespace}lang")
    if not lang:
        print(sample.text)

0
投票

您的 xml 片段有一个未关闭的标记,并且属性参数 a 和 b 必须是字符串“a”和“b”。比解析有效,你可以检查

.get('attrib_argument')
:

from lxml import etree as et

xml_str = """<components version="1.0.0">
    <component type="foo">
        <sample>Foo</sample>
        <sample lang="a">abc</sample>
        <sample lang="b">efj</sample>
    </component>
</components>
"""

root = et.fromstring(xml_str)

for elem in root.findall('.//sample'):
    if elem.get('lang') is not None:
        pass
    else:
        print(f"sample <tag> on list position {root.findall('.//sample').index(elem)} has no 'lang' attrib, Text: {elem.text}")

输出:

sample <tag> on list position 0 has no 'lang' attrib, Text: Foo
© www.soinside.com 2019 - 2024. All rights reserved.