我有一些带有嵌套属性的xml
<Groups>
<Artist>The Beatles</Artist>
<Releases>
<Release album="Abbey Road" year="1969" />
<Release album="The White Album" year="1968" />
</Releases>
</Groups>
<Groups>
<Artist>Bob Dylan</Artist>
<Releases>
<Release album="Blonde on Blonde" year="1966" />
<Release album="Blood on the Tracks" year="1975" />
</Releases>
</Groups>
<Groups>
<Artist>The Rolling Stones</Artist>
<Releases>
<Release album="Sticky Fingers" year="1971" />
<Release album="Exile On Main Street" year="1972" />
</Releases>
</Groups>
</Music>
我正在尝试找回六行数据框,但是它会建立多对多关系,其中每个艺术家都分配给每个专辑。这是我的代码以及错误的结果:
import xml.etree.cElementTree as et
import pandas as pd
tree=et.parse(r'music.xml')
root=tree.getroot()
Artists=[]
AlbumTitle=[]
ReleaseYear=[]
for x in root.iter('Artist'):
root1=et.Element('root')
root1=x
for records in root.iter('Release'):
root2=et.Element('root')
root2=records
AlbumTitle.append(records.attrib['album'])
ReleaseYear.append(records.attrib['year'])
Artists.append(x.text)
df = pd.DataFrame({'Artists': Artists,
'AlbumTitle': AlbumTitle,
'ReleaseYear': ReleaseYear})
Current output:
Artists AlbumTitle ReleaseYear
------- ----------- -----
1 The Beatles Abbey Road 1969
2 The Beatles The White album 1968
3 The Beatles Blonde On Blonde 1966
4 The Beatles Blood on The tracks 1975
5 The Beatles Sticky Fingers 1971
6 The Beatles Exile On Main Street 1972
7 Bob Dylan Abbey Road 1969
8 Bob Dylan The White album 1968
... ... ...
18 The Rolling Stones Exile On Main Street 1972
Target output:
Artists AlbumTitle ReleaseYear
------- ----------- -----
1 The Beatles Abbey Road 1969
2 The Beatles The White album 1968
3 Bob Dylan Blonde On Blonde 1966
4 Bob Dylan Blood on The tracks 1975
5 The Rolling Stones Sticky Fingers 1971
6 The Rolling Stones Exile On Main Street 1972
我阅读了ElementTree文档,以了解Artists.append如何在结合这两个属性时具有严格的关系,但到目前为止还没有运气。任何帮助将不胜感激,谢谢
这应该为您工作:
import xml.etree.cElementTree as et
import pandas as pd
tree=et.parse(r'music.xml')
root=tree.getroot()
Artists=[]
AlbumTitle=[]
ReleaseYear=[]
for group in root.iter('Groups'):
# Groups
artist = group[0].text
releases = group[1]
for release in releases:
Artists.append(artist)
AlbumTitle.append(release.attrib['album'])
ReleaseYear.append(release.attrib['year'])
df = pd.DataFrame({'Artists': Artists,
'AlbumTitle': AlbumTitle,
'ReleaseYear': ReleaseYear})
这是有关如何解析xml(https://docs.python.org/3.4/library/xml.etree.elementtree.html)的文档
输出:
Artists AlbumTitle ReleaseYear
0 The Beatles Abbey Road 1969
1 The Beatles The White Album 1968
2 Bob Dylan Blonde on Blonde 1966
3 Bob Dylan Blood on the Tracks 1975
4 The Rolling Stones Sticky Fingers 1971
5 The Rolling Stones Exile On Main Street 1972