如何使用 scrapy-playwright 为每个请求使用新的上下文?

问题描述 投票:0回答:1

这是我的做法,但我不确定它是否为每个新请求创建和使用新上下文:

class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = [...]
    cnt = 0

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url=url,
                                 meta={'playwright': True,
                                       'playwright_context': f'{self.cnt}'})

    def parse(self, response):
        self.cnt += 1
        for res in response.xpath('//div[@id="contenu"]'):
            url = res.xpath('.//h2/a/@href').get()
            yield scrapy.Request(url=url,
                                 callback=self.get_content,
                                 meta={'playwright': True,
                                       'playwright_context': f'{self.cnt}'})

这段代码是按照我想要的方式做的还是错误的?

python scrapy playwright-python
1个回答
1
投票

self.cnt += 1
应该在发送请求之前处于 for 循环中,以便在发送每个请求后使用递增的数字创建一个新上下文

Class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = [...]
    cnt = 0

    def start_requests(self):
        for url in self.start_urls: 
            self.cnt += 1   # <------ increment the count here
            yield scrapy.Request(url=url,
                                 meta={'playwright': True,
                                       'playwright_context': f'{self.cnt}'})

    def parse(self, response):
        for res in response.xpath('//div[@id="contenu"]'):
            url = res.xpath('.//h2/a/@href').get()
            self.cnt += 1    # <------ increment the count here
            yield scrapy.Request(url=url,
                                 callback=self.get_content,
                                 meta={'playwright': True,
                                       'playwright_context': f'{self.cnt}'})
© www.soinside.com 2019 - 2024. All rights reserved.