在lxml Python 3中如何递归所有链接的ids

问题描述 投票:0回答:1

我有这样的xml:

<library>
    <content content-id="title001">
        <content-links>
            <content-link content-id="Number1" />
            <content-link content-id="Number2" />
        </content-links>
    </content>
    <content content-id="title002">
        <content-links>
            <content-link content-id="Number3" />
        </content-links>
    </content>
    <content content-id="Number1">
        <content-links>
            <content-link content-id="Number1b" />
        </content-links>
    </content
</library>

我需要获取链接到特定content-id标题的所有content-id。例如,在这种情况下,我需要链接到title001的所有ID(我可能需要更多标题,因此这是需要找到的标题列表)。并将所有这些ID添加到如下所示的列表中:[title001,数字1,数字2,数字1b]

所以我想我需要递归检查每个内容,然后从内容链接中获取内容ID才能转到下一个内容,并在此检查所有的内容链接,直到下一个xml。完整阅读。

我无法为此找到递归解决方案。

添加我到目前为止获得的代码:

from lxml import etree as et
def get_ids(content):
    """
    """
    content_links = content.findall('content-links/content-link')
    print(content_links)
    if content_links:
        for content_link in content_links:
            print(content_link,content_link.get('content-id'))
            cl = content_link.get('content-id')
            cont = x.find(f'content[@id="{cl}"]')
            if cont is not None:
                get_ids(cont)

if __name__ == '__main__':
    """
    """
    x = et.fromstring(xml)
    ids = ['title001']
    for id in ids:
        content = x.find(f'content[@id="{content-id}"]')
        get_ids(content)
python-3.x recursion xml-parsing lxml
1个回答
0
投票
尝试以下代码:

from lxml import etree as et parser = et.XMLParser(remove_blank_text=True) tree = et.parse('Input.xml', parser) root = tree.getroot() cidList = ['title001'] # Your source list cidDct = { x: 0 for x in cidList } for elem in root.iter('content'): cid = elem.attrib.get('content-id', '') # print(f'{elem.tag:15} {cid}') if cid in cidDct.keys(): # print(f'Found: {cid}') for elem2 in elem.iter(): cid2 = elem2.attrib.get('content-id', '') if len(cid2) > 0: # print(f'Add: {cid2}') cidDct[cid2] = 0

© www.soinside.com 2019 - 2024. All rights reserved.