Scrapy/Python：在产生的请求完成后运行逻辑

Question

我做什么：

def parse(self, response):

    products_urls = response.css('.product-item a::attr(href)').extract()

    for product_url in product_urls:
        yield Request(product_url, callback=self.parse_product)

    print( "Continue doing stuff...." )


def parse_product(self, response):
    title = response.css('h1::text').extract_first()
    print( title )
}

在此示例中，代码将首先输出

Continue doing stuff..

，然后打印产品标题。我希望它以其他方式运行，首先执行请求并打印标题，然后才打印

Continue doing stuff..

更新： @Georgiy 在评论中询问我是否需要之前抓取的产品数据。

答案是肯定的，这是简化的示例。获取数据后，我想操作该数据。

Answer 1

您可以将逻辑移至

parse_product

函数。例如：

    def parse(self, response):
        products_urls = response.css('.product-item a::attr(href)').extract()

        self.count = len(products_urls)
        if self.count == 0:
            self.onEnd()
        else:
            for product_url in product_urls:
                yield Request(product_url, callback=self.parse_product)

    def onEnd(self):
        print( "Continue doing stuff...." )


    def parse_product(self, response):
        title = response.css('h1::text').extract_first()
        print( title )
        self.count -= 1
        if (self.count == 0):
            self.onEnd()

Answer 2

注意：由于缺乏代表，无法发表评论。

虽然上述代码适用于大多数情况，但我建议使用

self.crawler.stats

来跟踪计数，因为在较高并发请求时手动递减计数可能会引发竞争条件。下面是示例代码。

    def parse(self, response):
        products_urls = response.css('.product-item a::attr(href)').extract()

        self.count = len(products_urls)
        self.crawler.stats.set_value('processed_product_pages', 0)
        if self.count == 0:
            self.onEnd()
        else:
            for product_url in product_urls:
                yield Request(product_url, callback=self.parse_product)

    def onEnd(self):
        print( "Continue doing stuff...." )


    def parse_product(self, response):
        title = response.css('h1::text').extract_first()
        print( title )
        self.crawler.stats.inc_value('processed_product_pages')
        if (self.count == self.crawler.stats.get_value('processed_product_pages', 0)):
            self.onEnd()

Scrapy/Python：在产生的请求完成后运行逻辑

问题描述投票：0回答：2

2个回答

最新问题

Scrapy/Python：在产生的请求完成后运行逻辑

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2