我从数据科学和熊猫开始,我正在尝试使用XML信息填充pandas数据帧这里是我的代码:
import xml.etree.cElementTree as et
import pandas as pd
import sys
def getvalueofnode(node):
""" return node text or None """
return node.text if node is not None else None
def main():
parsed_xml = et.parse("test2.xml")
dfcols = ['Country','Club', 'Founded']
df_xml = pd.DataFrame(columns=dfcols)
for node in parsed_xml.getroot():
Country = node.attrib.get('country')
Club = node.find('Name')
Founded = node.find('Founded')
df_xml = df_xml.append(
pd.Series([Country, getvalueofnode(Club),getvalueofnode(Founded)], index=dfcols),
ignore_index=True)
print(df_xml)
main()
这是我的输出:
乡村俱乐部成立
0无无无
这是我的XML文件:
<?xml version="1.0"?>
<SoccerFeed timestamp="20181123T153249+0000">
<SoccerDocument Type="SQUADS Latest" competition_code="FR_L1" competition_id="24" competition_name="French Ligue 1" season_id="2016" season_name="Season 2016/2017">
<Team country="France" country_id="8" country_iso="FR" region_id="17" region_name="Europe" >
<Founded>1919</Founded>
<Name>Angers</Name>
<...>
<Team country="France" country_id="8" country_iso="FR" region_id="17" region_name="Europe" >
<Founded>1905</Founded>
<Name>Bastia</Name>
为什么我不能用我需要的信息获得Panda Dataframe?我在代码中遗漏了什么吗?谢谢您的帮助
在XML中,<Founded>
和<Name>
是<Team>
标签的子标签,country
属性也是<Team>
标签的一部分。因此,我们应该iter
ate <Team>
标签的XML DOM。接下来,应该有一些方法来存储每次迭代的for
循环的值,因为这些将是每列的行值。我们可以通过创建三列的字典(df_dict
)并将其值设置为空列表来实现。我们在每次迭代时为每个Country
,Club
和Founded
附加相应的列表。最后,我们从这本词典中创建了Dataframe(df
)。
import xml.etree.cElementTree as et
import pandas as pd
def main():
parsed_xml = et.parse("test.xml")
df_dict = {'Country':[],'Club':[], 'Founded':[]}
root = parsed_xml.getroot()
for country in root.iter('Team'):
Country = country.attrib.get('country')
Club = country.find('Name').text
Founded = country.find('Founded').text
df_dict['Country'].append(Country)
df_dict['Club'].append(Club)
df_dict['Founded'].append(Founded)
print('Dict for dataframe: {}'.format(df_dict))
df = pd.DataFrame(df_dict)
print("Dataframe: \n{}".format(df))
main()
以下是运行此脚本时的输出:
#Output:
Dict for dataframe: {'Country': ['France', 'France'], 'Club': ['Angers', 'Bastia'], 'Founded': ['1919', '1905']}
Dataframe:
Country Club Founded
0 France Angers 1919
1 France Bastia 1905