xml.etree.ElementTree.ParseError问题,尝试使用PY3从XML提取数据时

问题描述 投票:0回答:1

我在尝试使用Python3从xml文件中提取电子邮件时遇到问题。

我的代码是:

import xml.etree.ElementTree as ET
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

data = '''<row>
    <row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
        <codice_regionale>MI1604</codice_regionale>
        <denom_farmacia>Farmacia Varesina</denom_farmacia>
        <indirizzo>VIA VARESINA, 121</indirizzo>
        <localita>Milano</localita>
        <telefono>3480813398</telefono>
        <email>[email protected]</email>
        <caratterizzazione>urbana</caratterizzazione>
        <esenzioni>true</esenzioni>
        <location latitude="45.500881" longitude="9.141339"/>
</row>'''

tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml

print(results.text)

我得到的[[错误是

Traceback (most recent call last): File "farmacie.py", line 25, in <module> tree = ET.fromstring(data) #standard ET File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML return parser.close() xml.etree.ElementTree.ParseError: no element found: line 12, column 6
我该如何解决?
python xml elementtree
1个回答
1
投票
所以您似乎两次定义了row元素(或者您缺少多余的结束标签),这导致了一个问题。接下来是findall()将返回一个列表,因此您需要选择一个列表,或全部打印出来:

import xml.etree.ElementTree as ET data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km"> <codice_regionale>MI1604</codice_regionale> <denom_farmacia>Farmacia Varesina</denom_farmacia> <indirizzo>VIA VARESINA, 121</indirizzo> <localita>Milano</localita> <telefono>3480813398</telefono> <email>[email protected]</email> <caratterizzazione>urbana</caratterizzazione> <esenzioni>true</esenzioni> <location latitude="45.500881" longitude="9.141339"/> </row>''' tree = ET.fromstring(data) #standard ET results = tree.findall('email') #find the count section in xml print(results[0].text)

或:

for r in results: print(r.text)

更新:

获得完整的dataset之后,获取所有电子邮件的正确方法是:

import xml.etree.ElementTree as ET import requests data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content tree = ET.fromstring(data) results = tree.findall("./row/row/email") for r in results: print(r.text)

结果(2,684行):

[email protected] [email protected] [email protected] [email protected] ...

© www.soinside.com 2019 - 2024. All rights reserved.