如果使用python在给定列表中出现另一个属性,如何提取XML属性?

问题描述 投票:0回答:1

我有一个linkId的列表。

links_o_i = [652518,  345004, 225317, 177396, 551734]

此外,我有一个具有以下结构的XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE facilities SYSTEM "http://www.matsim.org/files/dtd/facilities_v1.dtd">
<facilities name="Facilities from different sources">

<!-- ====================================================================== -->

    <facility id="10002" linkId="666355" x="2684102.0" y="1253168.0">
        <activity type="other">
        </activity>

        <activity type="work">
        </activity>

    </facility>

<!-- ====================================================================== -->

    <facility id="10007" linkId="961312" x="2683486.0" y="1247853.0">
        <activity type="other">
        </activity>

        <activity type="work">
        </activity>

    </facility>

<!-- ====================================================================== -->

    <facility id="100070" linkId="652518" x="2684238.0" y="1246568.0">
        <activity type="leisure">
        </activity>

        <activity type="other">
        </activity>

        <activity type="work">
        </activity>

    </facility>

<!-- ====================================================================== -->

    <facility id="100071" linkId="1063278" x="2689220.0" y="1243493.0">
        <activity type="leisure">
        </activity>

        <activity type="other">
        </activity>

        <activity type="work">
        </activity>

    </facility>

<!-- ====================================================================== -->

    <facility id="100072" linkId="786540" x="2680812.0" y="1249375.0">
        <activity type="leisure">
        </activity>

        <activity type="other">
        </activity>

        <activity type="work">
        </activity>

    </facility>

<!-- ====================================================================== -->

    <facility id="100073" linkId="225317" x="2681506.0" y="1249508.0">
        <activity type="other">
        </activity>

        <activity type="shop">
        </activity>

        <activity type="work">
        </activity>

    </facility>

</facilities>

我想解析XML文件并提取x的相应yfacility值,它们在linkId列表内具有links_o_i

目标将是具有linkIdxy值的三列数据帧。

到目前为止,我的方法没有任何结果,我很难找到原因。必须注意的是,列表以及XML都更大。

import gzip
import xml.etree.ElementTree as ET
from collections import defaultdict
import pandas as pd


tree = ET.iterparse(gzip.open("file.xml.gz", 'r'))
link_coords = defaultdict(list)
for xml_event, elem in tree:
    attributes = elem.attrib
    if elem.tag == 'facility' \
    and elem.attrib["linkId"] in links_o_i:
        link_coords[attributes['linkId']].append[attributes['x', 'y']]
    elem.clear()  
link_coords = pd.DataFrame.from_dict(link_coords)
python xml pandas xml-parsing elementtree
1个回答
0
投票

您可以使用xmltodict将数据解析为dict格式,并提取ur数据:

extract = [{k:v for k,v in ent.items() if k in ['@linkId','@x','@y']}
           for ent in xmltodict.parse(data)['facilities']['facility']]

#filter for only entries in the list
res = [ent for ent in extract if int(ent['@linkId']) in links_o_i]

#read into dataframe
pd.DataFrame(res)

     @linkId    @x          @y
0   652518  2684238.0   1246568.0
1   225317  2681506.0   1249508.0
© www.soinside.com 2019 - 2024. All rights reserved.