我有这样的xml:
<library>
<content content-id="title001">
<content-links>
<content-link content-id="Number1" />
<content-link content-id="Number2" />
</content-links>
</content>
<content content-id="title002">
<content-links>
<content-link content-id="Number3" />
</content-links>
</content>
<content content-id="Number1">
<content-links>
<content-link content-id="Number1b" />
</content-links>
</content
</library>
我需要获取链接到特定content-id标题的所有content-id。例如,在这种情况下,我需要链接到title001的所有ID(我可能需要更多标题,因此这是需要找到的标题列表)。并将所有这些ID添加到如下所示的列表中:[title001,数字1,数字2,数字1b]
所以我想我需要递归检查每个内容,然后从内容链接中获取内容ID才能转到下一个内容,并在此检查所有的内容链接,直到下一个xml。完整阅读。
我无法为此找到递归解决方案。
添加我到目前为止获得的代码:
from lxml import etree as et
def get_ids(content):
"""
"""
content_links = content.findall('content-links/content-link')
print(content_links)
if content_links:
for content_link in content_links:
print(content_link,content_link.get('content-id'))
cl = content_link.get('content-id')
cont = x.find(f'content[@id="{cl}"]')
if cont is not None:
get_ids(cont)
if __name__ == '__main__':
"""
"""
x = et.fromstring(xml)
ids = ['title001']
for id in ids:
content = x.find(f'content[@id="{content-id}"]')
get_ids(content)
from lxml import etree as et
parser = et.XMLParser(remove_blank_text=True)
tree = et.parse('Input.xml', parser)
root = tree.getroot()
cidList = ['title001'] # Your source list
cidDct = { x: 0 for x in cidList }
for elem in root.iter('content'):
cid = elem.attrib.get('content-id', '')
# print(f'{elem.tag:15} {cid}')
if cid in cidDct.keys():
# print(f'Found: {cid}')
for elem2 in elem.iter():
cid2 = elem2.attrib.get('content-id', '')
if len(cid2) > 0:
# print(f'Add: {cid2}')
cidDct[cid2] = 0