如何在Python中获得XML中的第二个同级兄弟?

问题描述 投票:0回答:1

我有一个要迭代的XML。我需要找到特定节点的前一个节点(带有标签“ text”和属性“ bbox”)。问题是,我想指定标签是否没有“ bbox”属性,而不在乎它并获取元素。但是我不知道该怎么做。这是代码:

 import lxml.etree as etree

from lxml.builder import E

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('fe3.xml', parser)
root = tree.getroot()

for x in tree.xpath('//text'):
        bb = x.attrib.get('bbox')
        if bb is not None:
            bb = bb.split(',')
        print('This: ', bb)
        xPrev = x.getprevious()
        bb = None
        if xPrev is not None:
            bb = xPrev.attrib.get('bbox')
            if bb is not None:
                bb = bb.split(',')
        if bb is not None:
            print('  Previous: ', bb)
        else:
            xx = bb.getprevious()
            print(xx, '  No previous bbox')

为清楚起见,我的XML的结构如下(实际上更长):

<?xml version="1.0" encoding="utf-8"?>
<pages>
    <page id="1" bbox="0.000,0.000,462.047,680.315" rotate="0">
        <textbox id="0" bbox="179.739,592.028,261.007,604.510">
            <textline bbox="179.739,592.028,261.007,604.510">
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">C</text>
                <text font="NUMPTY+ImprintMTnum-it"  bbox="192.745,592.218,199.339,603.578" ncolour="0" size="12.333">A</text>
                <text font="NUMPTY+ImprintMTnum-it"  bbox="193.745,592.218,199.339,603.578" ncolour="0" size="12.333">P</text>
                <text font="NUMPTY+ImprintMTnum-it"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.333">I</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">T</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">O</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">L</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">O</text>
                <text></text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">I</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">I</text>
                <text font="NUMPTY+ImprintMTnum"  bbox="191.745,592.218,199.339,603.578" ncolour="0" size="12.482">I</text>
                <text></text>
            </textline>
        </textbox>
    </page>
</pages>
python xml xpath tags lxml
1个回答
0
投票

我不清楚您要达到的目标100%。话虽这么说。

当您遍历bbox节点时,您可以简单地添加一个变量并将'previous node'bbox存储在其中。

这是我要使用的代码...如果我对要实现的目标正确的话


x_prev = None
for x in tree.xpath('//text'):
        bb = x.attrib.get('bbox')
        if bb is not None:
            bb = bb.split(',')
        print('This: ', bb)

        if x_prev is not None:
            print('  Previous: ', x_prev)
        else:
            print('  No previous bbox')

        # Store this bounding box for the next loop (to be used as x_prev)
        x_prev = bb

为清楚起见,此代码将替换您的整个循环

© www.soinside.com 2019 - 2024. All rights reserved.