在python 3.x中解析sitemap xml

Question

我的xml结构如下

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
        <loc>hello world 1</loc>
        <image:image>
            <image:loc>this is image loc 1</image:loc>
            <image:title>this is image title 1</image:title>
        </image:image>
        <lastmod>2019-06-19</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.25</priority>
    </url>
    <url>
        <loc>hello world 2</loc>
        <image:image>
            <image:loc>this is image loc 2</image:loc>
            <image:title>this is image title 2</image:title>
        </image:image>
        <lastmod>2020-03-19</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.25</priority>
    </url>
</urlset>

我只想得到

hello world 1
hello world 2

我的python代码在下面：

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for url in root.findall('url'):
    loc = url.find('loc').text
    print(loc)

不幸的是它没有给我任何东西。

但是当我将xml更改为

<urlset>
    <url>
        <loc>hello world 1</loc>
        <lastmod>2019-06-19</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.25</priority>
    </url>
    <url>
        <loc>hello world 2</loc>
        <lastmod>2020-03-19</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.25</priority>
    </url>
</urlset>

它给我正确的结果。

hello world 1
hello world 2

我该如何做才能在不更改xml的情况下获得正确的结果？因为修改10000行以上的文件没有任何意义。

TIA

Answer 1

对代码的（不明确的）修正是：

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

# In find/findall, prefix namespaced tags with the full namespace in braces
for url in root.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
    loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
    print(loc)

这是因为您必须使用定义XML的名称空间来限定标签名称。有关如何将find和findall方法与命名空间一起使用的详细信息，请参见Parse XML namespace with Element Tree findall

在python 3.x中解析sitemap xml

问题描述投票：0回答：1

1个回答

最新问题

在python 3.x中解析sitemap xml

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1