我正在尝试使用 Python 将 XML 文件转换为 CSV 文件。我在 XML 中有两个要查找的值。
这是我的代码:
import csv
import xml.etree.ElementTree as ET
def main():
# define input file
input_dir = r"my\filepath\input.xml"
# load the xml file
tree = ET.parse(input_dir)
root = tree.getroot()
# open the output csv file in write mode
csvfile = open(r'my\filepath\output.csv', 'w', newline='')
csvwriter = csv.writer(csvfile)
# write the header row
csvwriter.writerow(['value1', 'value2'])
# iterate over the CARD elements and write the data rows
for card in root.iter('CARD'):
value1= card.find('value1').text
csvwriter.writerow([value1])
# Loop through each 'BAR' element and print its 'VALUE2' attribute
for foo in root.findall('.//BAR'):
value2= suoritus.get('VALUE2')
csvwriter.writerow([value2])
# close the csv file
csvfile.close()
main()
现在它几乎可以工作了。该脚本将两个值写入一个 csv 文件,但不是将它们写入以彼此对齐,而是一个接一个地写入。
所以不是这个:
值1 | 值2 |
---|---|
x1 | z1 |
x2 | z2 |
我明白了:
值1 | 值2 |
---|---|
x1 | |
x2 | |
z1 | |
z2 |
如何解决这个问题并让值进入各自的列?
我尝试为
csvwriter.writerow
函数提供预定义的行数,但它只在我的数据中添加了逗号,同时仍然没有将值移动到所需的列。
# iterate over the CARD elements and write the data rows
for card in root.iter('CARD'):
value1= card.find('value1').text
csvwriter.writerow([value1,''])
# Loop through each 'BAR' element and print its 'VALUE2' attribute
for foo in root.findall('.//BAR'):
value2= suoritus.get('VALUE2')
csvwriter.writerow(['',value2])
我还尝试将第二个
for
循环缩进到第一个循环中,但这根本没有帮助。它只是移动了数据,同时仍然停留在第一列。
# iterate over the CARD elements and write the data rows
for card in root.iter('CARD'):
value1= card.find('value1').text
csvwriter.writerow([value1])
# Loop through each 'BAR' element and print its 'VALUE2' attribute
for foo in root.findall('.//BAR'):
value2= suoritus.get('VALUE2')
csvwriter.writerow([value2])
这是我要转换的 XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<QUERY_IMPORT CREATEIFNOTFOUND="1" ARCHIVEMODE="-1">
<IDENTITY>export</IDENTITY>
<CARD VALUE2="11">
<value9>2</value9>
<r_id>11</r_id>
<value1>name_of_something</value1>
<r_begin>20040101</r_begin>
<r_end>
</r_end><BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
</CARD>
<CARD VALUE2="12">
<value9>2</value9>
<r_id>12</r_id>
<value1>name_of_other</value1>
<r_begin>20040101</r_begin>
<r_end>
</r_end><BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
<BAR VALUE2="randomvaluestuff">
<VALINTA>1901</VALINTA>
</BAR>
</CARD>
当然,您需要嵌套循环,但只需在内部循环中找到该特定卡片的 BAR:
for card in root.iter('CARD'):
value1 = card.find('value1').text
for foo in card.findall('.//BAR'):
value2 = foo.get('VALUE2')
csvwriter.writerow([value1, value2])
然后,给出一个示例 XML(从您的中截断,有更多变化)
<?xml version="1.0" encoding="ISO-8859-1"?>
<QUERY_IMPORT CREATEIFNOTFOUND="1" ARCHIVEMODE="-1">
<IDENTITY>export</IDENTITY>
<CARD VALUE2="11">
<value1>name_of_something</value1>
<BAR VALUE2="value2-1" />
<BAR VALUE2="value2-2" />
<BAR VALUE2="value2-3" />
</CARD>
<CARD VALUE2="12">
<value1>another_name</value1>
<BAR VALUE2="value2-4" />
<BAR VALUE2="value2-5" />
<BAR VALUE2="value2-6" />
</CARD>
</QUERY_IMPORT>
你会得到
name_of_something,value2-1
name_of_something,value2-2
name_of_something,value2-3
another_name,value2-4
another_name,value2-5
another_name,value2-6
您可以使用
Pandas
与pd.read_xml
轻松处理您的文件并使用to_csv
导出:
# pip install pandas
import pandas as pd
pd.read_xml('data.xml', xpath='.//CARD')[['value1', 'VALUE2']]
df.to_csv('data.csv', index=False) # to csv
输出:
>>> df
value1 VALUE2
0 name_of_something 11
1 name_of_other 12