如何让scrapy playwright出错时重试

问题描述 投票:0回答:1

所以我正在尝试编写一个使用Scrapy-playwright的爬虫。 在之前的项目中,我只使用了 Scrapy 并设置了

RETRY_TIMES = 3
。即使我无法访问所需的资源,蜘蛛也会尝试发送请求 3 次,然后才会关闭。

我也尝试过同样的方法,但似乎不起作用。在第一个错误中,我得到蜘蛛正在关闭。有人可以帮我吗?我应该怎么做才能让蜘蛛根据需要多次尝试请求 url?

这是我的 settings.py 的一些示例:

RETRY_ENABLED = True
RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 60
DOWNLOAD_DELAY = random.uniform(0, 1)

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

提前致谢!

python scrapy playwright scrapy-playwright
1个回答
0
投票

确保捕获并记录剧作家脚本中的异常情况。这将帮助您确定 Playwright 脚本本身是否遇到触发蜘蛛关闭的错误。

RETRY_ENABLED = True
RETRY_TIMES = 3  
DOWNLOAD_TIMEOUT = 60 
DOWNLOAD_DELAY = random.uniform(0, 1)  # Introduce a random delay between requests

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

您已将 DOWNLOAD_TIMEOUT 设置为 60 秒,这是一个相对较长的时间。确保对于您发出的请求类型来说,超时不会太短。如果请求需要很长时间才能响应,这可能会影响重试行为。

© www.soinside.com 2019 - 2024. All rights reserved.