我希望将这个 xml 文件 countries 转换为 csv 中的表格。但是我在解析它/从中提取数据时遇到问题。我试图将其变成一个包含 5 列的表,分别是 ['CtryNm'、'CcyNm'、'Ccy'、'CcyNbr'、'CcyMnrUnts']。这是 xml 文件结构的片段
<ISO_4217 Pblshd="2024-01-01">
<CcyTbl>
<CcyNtry>
<CtryNm>AFGHANISTAN</CtryNm>
<CcyNm>Afghani</CcyNm>
<Ccy>AFN</Ccy>
<CcyNbr>971</CcyNbr>
<CcyMnrUnts>2</CcyMnrUnts>
</CcyNtry>
<CcyNtry>
<CtryNm>ZZ07_No_Currency</CtryNm>
<CcyNm>The codes assigned for transactions where no currency is involved</CcyNm>
<Ccy>XXX</Ccy>
<CcyNbr>999</CcyNbr>
<CcyMnrUnts>N.A.</CcyMnrUnts>
</CcyNtry>
......
</CcyTbl>
</ISO_4217>
我尝试过做
def parse_xml(xml_file):
tree = ET.parse(xml_file)
root = tree.getroot()
return root
def extract_data(root):
data = []
for record in root.findall('CcyNtry'):
row = {}
for field in record:
row[field.tag] = field.text
data.append(row)
return data
def main(xml_file, csv_file):
root = parse_xml(xml_file)
data = extract_data(root)
df = to_dataframe(data)
to_csv(df, csv_file)
main(countries, 'output.csv')
但是这似乎只是返回一个空文件。有人知道我在这里做错了什么吗?或者是否有一种简单的方法可以将此 xml 中的数据转换为数据框?谢谢。
带
beautifulsoup
的版本:
import requests
from bs4 import BeautifulSoup
url = "https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xml"
soup = BeautifulSoup(requests.get(url).content, "xml")
data = []
for c in soup.select("CcyNtry"):
data.append({t.name: t.text for t in c.find_all()})
df = pd.DataFrame(data)
print(df.head(10))
打印:
CtryNm CcyNm Ccy CcyNbr CcyMnrUnts
0 AFGHANISTAN Afghani AFN 971 2
1 ÅLAND ISLANDS Euro EUR 978 2
2 ALBANIA Lek ALL 008 2
3 ALGERIA Algerian Dinar DZD 012 2
4 AMERICAN SAMOA US Dollar USD 840 2
5 ANDORRA Euro EUR 978 2
6 ANGOLA Kwanza AOA 973 2
7 ANGUILLA East Caribbean Dollar XCD 951 2
8 ANTARCTICA No universal currency NaN NaN NaN
9 ANTIGUA AND BARBUDA East Caribbean Dollar XCD 951 2