平整嵌套的XML,同时保持两个标签之间的父子关系

问题描述 投票:0回答:1

我有一些带有嵌套属性的xml

  <Groups>
    <Artist>The Beatles</Artist>
    <Releases>
      <Release album="Abbey Road" year="1969" />
      <Release album="The White Album" year="1968" />
    </Releases>
  </Groups>
  <Groups>
    <Artist>Bob Dylan</Artist>
    <Releases>
      <Release album="Blonde on Blonde" year="1966" />
      <Release album="Blood on the Tracks" year="1975" />
    </Releases>
  </Groups>
  <Groups>
    <Artist>The Rolling Stones</Artist>
    <Releases>
      <Release album="Sticky Fingers" year="1971" />
      <Release album="Exile On Main Street" year="1972" />
    </Releases>
  </Groups>
</Music>

我正在尝试找回六行数据框,但是它会建立多对多关系,其中每个艺术家都分配给每个专辑。这是我的代码以及错误的结果:

import xml.etree.cElementTree as et
import pandas as pd

tree=et.parse(r'music.xml')
root=tree.getroot()

Artists=[]
AlbumTitle=[]
ReleaseYear=[]

for x in root.iter('Artist'):
    root1=et.Element('root')
    root1=x
    for records in root.iter('Release'):
        root2=et.Element('root')
        root2=records
        AlbumTitle.append(records.attrib['album'])
        ReleaseYear.append(records.attrib['year'])
        Artists.append(x.text)

df = pd.DataFrame({'Artists': Artists, 
                   'AlbumTitle': AlbumTitle,
                   'ReleaseYear': ReleaseYear})

Current output:

Artists                         AlbumTitle          ReleaseYear
-------                         -----------           -----         
1   The Beatles                 Abbey Road              1969
2   The Beatles                 The White album         1968
3   The Beatles                 Blonde On Blonde        1966
4   The Beatles                 Blood on The tracks     1975
5   The Beatles                 Sticky Fingers          1971
6   The Beatles                 Exile On Main Street    1972
7   Bob Dylan                   Abbey Road              1969
8   Bob Dylan                   The White album         1968
...                             ...                     ...
18  The Rolling Stones          Exile On Main Street    1972 


Target output:

Artists               AlbumTitle            ReleaseYear
-------               -----------           -----           
1 The Beatles         Abbey Road            1969
2 The Beatles         The White album       1968
3 Bob Dylan           Blonde On Blonde      1966
4 Bob Dylan           Blood on The tracks   1975
5 The Rolling Stones  Sticky Fingers        1971
6 The Rolling Stones  Exile On Main Street  1972

我阅读了ElementTree文档,以了解Artists.append如何在结合这两个属性时具有严格的关系,但到目前为止还没有运气。任何帮助将不胜感激,谢谢

python xml pandas elementtree
1个回答
0
投票

这应该为您工作:

import xml.etree.cElementTree as et
import pandas as pd

tree=et.parse(r'music.xml')
root=tree.getroot()

Artists=[]
AlbumTitle=[]
ReleaseYear=[]

for group in root.iter('Groups'):
    # Groups
    artist = group[0].text
    releases = group[1]
    for release in releases:
        Artists.append(artist)
        AlbumTitle.append(release.attrib['album'])
        ReleaseYear.append(release.attrib['year'])

df = pd.DataFrame({'Artists': Artists,
                   'AlbumTitle': AlbumTitle,
                   'ReleaseYear': ReleaseYear})

这是有关如何解析xml(https://docs.python.org/3.4/library/xml.etree.elementtree.html)的文档

输出:

              Artists            AlbumTitle ReleaseYear
0         The Beatles            Abbey Road        1969
1         The Beatles       The White Album        1968
2           Bob Dylan      Blonde on Blonde        1966
3           Bob Dylan   Blood on the Tracks        1975
4  The Rolling Stones        Sticky Fingers        1971
5  The Rolling Stones  Exile On Main Street        1972
© www.soinside.com 2019 - 2024. All rights reserved.