如何在XML名称空间中使用Scrapy XPath？

Question

如何从<content:encoded> ... </content:encoded>（下面的示例）中使用Scrapy XPath提取RSS feed内容？

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Latest &#8211; Reason.com</title>
    <item>
        <pubDate>Thu, 16 Jan 2020 21:40:23 +0000</pubDate>
        <content:encoded><![CDATA[<p><span style="font-weight: 400">
          Jimmy Meders was scheduled to die by lethal injection today, 
          but the Georgia parole board has granted him clemency.</span></p>]]> 
        </content:encoded>
...

我尝试过response.xpath('//content:encoded').get()，但是它不起作用。

非常感谢您的帮助。

Answer 1

您必须声明并注册XML名称空间前缀：

response.selector.register_namespace('content', 
                                     'http://purl.org/rss/1.0/modules/content/')
response.xpath('//content:encoded').getall()

文档： register_namespace()

如何在XML名称空间中使用Scrapy XPath？

问题描述投票：0回答：1

1个回答

最新问题

如何在XML名称空间中使用Scrapy XPath？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1