在 Python 上使用 BeautifulSoup 从 XML 文件中提取内容

问题描述 投票:0回答:1
python xml validation beautifulsoup xml-parsing
1个回答
0
投票

如果我对你的理解正确,你想得到所有

<Trainers>
和所有
name
/
value
对:

from bs4 import BeautifulSoup

xml_doc = """\
<Trainers>
 <Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
 <Trainer name="VisitorNumber" value="BR-76594823-009922"/>
 <Trainer name="ServerIndex" value="213122"/>
 <Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>"""

soup = BeautifulSoup(xml_doc, "xml")

for item in soup.select("Trainers"):
    for trainer in item.select("Trainer"):
        print(trainer["name"], trainer["value"])

印花:

VisitorID  NPRoiuKL213kiolkm2231
VisitorNumber BR-76594823-009922
ServerIndex 213122
VisitorPolicyID ETR1234123

如果你想从数据构造数据框,你可以使用这个例子:

df = pd.DataFrame(
    [
        {t["name"]: t["value"] for t in item.select("Trainer")}
        for item in soup.select("Trainers")
    ]
)
print(df)

印花:

                VisitorID       VisitorNumber ServerIndex VisitorPolicyID
0   NPRoiuKL213kiolkm2231  BR-76594823-009922      213122      ETR1234123
© www.soinside.com 2019 - 2024. All rights reserved.