<PREAMB>
<AGENCY TYPE="S">HOMELAND SECURITY </AGENCY>
<AGENCY TYPE="O">LABOR</AGENCY>
<AGY>
<HD SOURCE="HED">AGENCY:</HD>
<P>U.S. Citizenship and Immigration Services</P>
</AGY>
</PREAMB>
我怎样才能得到这个 - “部门是”:“国土安全部、劳工部:美国公民及移民服务部”
下面的代码只是返回 - “部门是”:“劳工:美国公民及移民服务局”
for agency in preambl.findall("./PREAMB/AGENCY"):
departments = agency.text
if departments != '' or departments != None:
if pre.findall("./PREAMB/AGY"):
agency1 = ''
for agencies in pre.findall("./PREAMB/AGY/P"):
for para1 in agencies.itertext():
agency1 += para1.replace('\n', ' ')
agency1 = ' '.join(agency1.split())
if agency1:
agency1 = '{"departments are":"' + str(departments) + ' : ' + str(agency1) + '"}'
agency1 = json.loads(agency1)
如有任何帮助,我们将不胜感激。
我认为你把事情搞得太复杂了。试试这个方法:
targets = ['.//AGENCY','.//AGY//P']
agencies = []
for target in targets:
agencies.extend([agency.text for agency in preambl.findall(f'{target}')])
print('agencies are: ',agencies)
看看是否得到了预期的输出。