[使用Python中的ElementTree从XML提取数据

问题描述 投票:1回答:1

我有以下XML文件,我必须解析该文件并将其提取为csv文件中的数据。在此文件中,我有两个框(box_id),它们包装在两个不同的父对象(parent_box_id)上,并且每个框的内容的详细信息(元素sgtin-> info_sgtin)。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<doc>
    <info id_reference="2">
        <data_down>
            <tree>
                <box_id>046071598600870568</box_id>
                <parent_box_id>046071598600875594</parent_box_id>
            </tree>
            <tree>
                <box_id>046071598600870575</box_id>
                <parent_box_id>046071598600875594</parent_box_id>
            </tree>
            <tree>
                <sgtin>
                    <info_sgtin>
                        <sgtin>04607008133585B0SE1HVHBGR3A</sgtin>
                        <box_id>046071598600870568</box_id>
                        <gtin>04607008133585</gtin>
                        <series_number>026A</series_number>
                    </info_sgtin>
                </sgtin>
                <parent_box_id>046071598600870568</parent_box_id>
            </tree>
            <tree>
                <sgtin>
                    <info_sgtin>
                        <sgtin>046070081335856F7P78HBVBEH2</sgtin>
                        <box_id>046071598600870568</box_id>
                        <gtin>04607008133585</gtin>
                        <series_number>026A</series_number>
                    </info_sgtin>
                </sgtin>
                <parent_box_id>046071598600870568</parent_box_id>
            </tree>
            <tree>
                <sgtin>
                    <info_sgtin>
                        <sgtin>046070081335854T61H7CSXDE9W</sgtin>
                        <box_id>046071598600870575</box_id>
                        <gtin>04607008133585</gtin>
                        <series_number>026A</series_number>
                    </info_sgtin>
                </sgtin>
                <parent_box_id>046071598600870575</parent_box_id>
            </tree>
        </data_down>
    </info>
</doc>

为此,我决定在Python中使用Elementtree,但是问题是在我的XML文件中,我有两个tag变体。

首先,我遍历所有详细信息并捕获box_id的值,但是在那之后,我必须转到父项并获取其中包装了box_id的parent_box_id。

换句话说,我想通过以下方式获取数据:

parent_box_id       box_id              sgtin                           series_number
046071598600875594  046071598600870568  04607008133585B0SE1HVHBGR3A     026A
046071598600875594  046071598600870568  046070081335856F7P78HBVBEH2     026A
046071598600875595  046071598600870575  046070081335854T61H7CSXDE9W     026A

但是我不知道如何获取parent_box_id值。感谢社区的任何支持。

这是我的代码:

import csv
import xml.etree.ElementTree as ET

csv.writer(open('result.csv','w'),delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL))

tree = ET.parse('test.xml')
root = tree.getroot()

with open('result.csv','a',newline='') as myfile:
    writer = csv.writer(myfile, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    for alist in root.iter('info_sgtin'):
    sgtin = alist.find('sgtin').text
    box_id = alist.find('box_id').text
    series = alist.find('series_number').text

    writer.writerow([sgtin,box_id,series])
python xml parsing elementtree
1个回答
0
投票

这是使用XPATH的解决方案:

root = etree.parse(...)  # assuming this is your ElementTree
for elem in root.iter("info_sgtin"):
    sgtin = elem.xpath("sgtin")[0].text  # you can use .find as well
    series_number = elem.xpath("series_number")[0].text
    box_id = elem.xpath("box_id")[0].text

    # first get the parent and then the following sibling which is the parent_box_id
    parent_box_id = elem.xpath("parent::sgtin/following-sibling::parent_box_id")[0].text
    print(parent_box_id, box_id, sgtin, series_number)

输出:

046071598600870568 046071598600870568 04607008133585B0SE1HVHBGR3A 026A
046071598600870568 046071598600870568 046070081335856F7P78HBVBEH2 026A
046071598600870575 046071598600870575 046070081335854T61H7CSXDE9W 026A
© www.soinside.com 2019 - 2024. All rights reserved.