如何对以下类型的网页进行分页？

Question

我正试图对这个网站的页面进行分页（http://www.geny-interim.com/offres/）。问题是我使用css选择器通过使用此代码遍历每个页面

next_page_url=response.css('a.page:nth-child(4)::attr(href)').extract_first()
        if next_page_url:
            yield scrapy.Request(next_page_url)

但这样做只会分页到两页，然后css选择器不能按预期工作。我也尝试使用它：

response.xpath('//*[contains(text(), "›")]/@href/text()').extract_first()

但这也产生了价值误差。任何帮助都会被投赞成票。

Answer 1

这个XPath表达式存在问题

//*[contains(text(), "›")]/@href/text()

因为href属性没有text()属性。

这是一个可以适应您需求的工作蜘蛛：

# -*- coding: utf-8 -*-
import scrapy


class GenyInterimSpider(scrapy.Spider):
    name = 'geny-interim'
    start_urls = ['http://www.geny-interim.com/offres/']

    def parse(self, response):
        for offer in response.xpath('//div[contains(@class,"featured-box")]'):
            yield {
                'title': offer.xpath('.//h3/a/text()').extract_first()
            }
        next_page_url = response.xpath('//a[@class="page" and contains(.,"›")]/@href').extract_first()
        if next_page_url:
            yield scrapy.Request(response.urljoin(next_page_url), callback=self.parse)

如何对以下类型的网页进行分页？

问题描述投票：0回答：1

1个回答

最新问题

如何对以下类型的网页进行分页？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1