所以我正在尝试编写一个使用Scrapy-playwright的爬虫。 在之前的项目中,我只使用了 Scrapy 并设置了
RETRY_TIMES = 3
。即使我无法访问所需的资源,蜘蛛也会尝试发送请求 3 次,然后才会关闭。
我也尝试过同样的方法,但似乎不起作用。在第一个错误中,我得到蜘蛛正在关闭。有人可以帮我吗?我应该怎么做才能让蜘蛛根据需要多次尝试请求 url?
这是我的 settings.py 的一些示例:
RETRY_ENABLED = True
RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 60
DOWNLOAD_DELAY = random.uniform(0, 1)
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
提前致谢!
确保捕获并记录剧作家脚本中的异常情况。这将帮助您确定 Playwright 脚本本身是否遇到触发蜘蛛关闭的错误。
RETRY_ENABLED = True
RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 60
DOWNLOAD_DELAY = random.uniform(0, 1) # Introduce a random delay between requests
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
您已将 DOWNLOAD_TIMEOUT 设置为 60 秒,这是一个相对较长的时间。确保对于您发出的请求类型来说,超时不会太短。如果请求需要很长时间才能响应,这可能会影响重试行为。