当其中一个值是标题属性时,如何将 XML 转换为 CSV?

问题描述 投票:0回答:2

我正在尝试使用 Python 将 XML 文件转换为 CSV 文件。我在 XML 中有两个要查找的值。

这是我的代码:

import csv
import xml.etree.ElementTree as ET

def main():

    # define input file
    input_dir = r"my\filepath\input.xml"

    # load the xml file
    tree = ET.parse(input_dir)
    root = tree.getroot()

    # open the output csv file in write mode
    csvfile = open(r'my\filepath\output.csv', 'w', newline='')
    csvwriter = csv.writer(csvfile)

    # write the header row
    csvwriter.writerow(['value1', 'value2'])

    # iterate over the CARD elements and write the data rows
    for card in root.iter('CARD'):
        value1= card.find('value1').text
        csvwriter.writerow([value1])

    # Loop through each 'BAR' element and print its 'VALUE2' attribute
    for foo in root.findall('.//BAR'):
        value2= suoritus.get('VALUE2')
        csvwriter.writerow([value2])

    # close the csv file
    csvfile.close()

main()

现在它几乎可以工作了。该脚本将两个值写入一个 csv 文件,但不是将它们写入以彼此对齐,而是一个接一个地写入。

所以不是这个:

值1 值2
x1 z1
x2 z2

我明白了:

值1 值2
x1
x2
z1
z2

如何解决这个问题并让值进入各自的列?

我尝试为

csvwriter.writerow
函数提供预定义的行数,但它只在我的数据中添加了逗号,同时仍然没有将值移动到所需的列。

# iterate over the CARD elements and write the data rows
for card in root.iter('CARD'):
    value1= card.find('value1').text
    csvwriter.writerow([value1,''])

    # Loop through each 'BAR' element and print its 'VALUE2' attribute
    for foo in root.findall('.//BAR'):
        value2= suoritus.get('VALUE2')
        csvwriter.writerow(['',value2])

我还尝试将第二个

for
循环缩进到第一个循环中,但这根本没有帮助。它只是移动了数据,同时仍然停留在第一列。

# iterate over the CARD elements and write the data rows
for card in root.iter('CARD'):
    value1= card.find('value1').text
    csvwriter.writerow([value1])

    # Loop through each 'BAR' element and print its 'VALUE2' attribute
    for foo in root.findall('.//BAR'):
        value2= suoritus.get('VALUE2')
        csvwriter.writerow([value2])

这是我要转换的 XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<QUERY_IMPORT CREATEIFNOTFOUND="1" ARCHIVEMODE="-1">
<IDENTITY>export</IDENTITY>
<CARD VALUE2="11">
<value9>2</value9>
<r_id>11</r_id>
<value1>name_of_something</value1>
<r_begin>20040101</r_begin>
<r_end>
</r_end><BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
</CARD>
<CARD VALUE2="12">
<value9>2</value9>
<r_id>12</r_id>
<value1>name_of_other</value1>
<r_begin>20040101</r_begin>
<r_end>
</r_end><BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
</CARD>
python xml csv elementtree
2个回答
1
投票

当然,您需要嵌套循环,但只需在内部循环中找到该特定卡片的 BAR:

    for card in root.iter('CARD'):
        value1 = card.find('value1').text
        for foo in card.findall('.//BAR'):
            value2 = foo.get('VALUE2')
            csvwriter.writerow([value1, value2])

然后,给出一个示例 XML(从您的中截断,有更多变化)

<?xml version="1.0" encoding="ISO-8859-1"?>
<QUERY_IMPORT CREATEIFNOTFOUND="1" ARCHIVEMODE="-1">
    <IDENTITY>export</IDENTITY>
    <CARD VALUE2="11">
        <value1>name_of_something</value1>
        <BAR VALUE2="value2-1" />
        <BAR VALUE2="value2-2" />
        <BAR VALUE2="value2-3" />
    </CARD>
    <CARD VALUE2="12">
        <value1>another_name</value1>
        <BAR VALUE2="value2-4" />
        <BAR VALUE2="value2-5" />
        <BAR VALUE2="value2-6" />
    </CARD>
</QUERY_IMPORT>

你会得到

name_of_something,value2-1
name_of_something,value2-2
name_of_something,value2-3
another_name,value2-4
another_name,value2-5
another_name,value2-6

-1
投票

您可以使用

Pandas
pd.read_xml
轻松处理您的文件并使用
to_csv
导出:

# pip install pandas
import pandas as pd

pd.read_xml('data.xml', xpath='.//CARD')[['value1', 'VALUE2']]
df.to_csv('data.csv', index=False)  # to csv

输出:

>>> df
              value1  VALUE2
0  name_of_something      11
1      name_of_other      12
© www.soinside.com 2019 - 2024. All rights reserved.