使用scrapy进行while循环时出现ReactorNotRestartable错误

问题描述 投票:12回答:3

我执行以下代码时出现twisted.internet.error.ReactorNotRestartable错误:

from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher

result = None

def set_result(item):
    result = item

while True:
    process = CrawlerProcess(get_project_settings())
    dispatcher.connect(set_result, signals.item_scraped)

    process.crawl('my_spider')
    process.start()

    if result:
        break
    sleep(3)

它第一次起作用,然后我得到错误。我每次创建process变量,那么问题是什么?

python python-2.7 scrapy twisted
3个回答
5
投票

默认情况下,CrawlerProcess.start()将在所有爬虫完成后停止它创建的扭曲反应堆。

如果在每次迭代中创建process.start(stop_after_crawl=False),则应调用process

另一种选择是自己处理Twisted反应器并使用CrawlerRunnerThe docs have an example这样做。


2
投票

我能够像这样解决这个问题。 process.start()应该只召唤一次。

from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher

result = None

def set_result(item):
    result = item

while True:
    process = CrawlerProcess(get_project_settings())
    dispatcher.connect(set_result, signals.item_scraped)

    process.crawl('my_spider')

process.start()

1
投票

参考http://crawl.blog/scrapy-loop/

 import scrapy
 from scrapy.crawler import CrawlerProcess
 from scrapy.utils.project import get_project_settings     
 from twisted.internet import reactor
 from twisted.internet.task import deferLater

 def sleep(self, *args, seconds):
    """Non blocking sleep callback"""
    return deferLater(reactor, seconds, lambda: None)

 process = CrawlerProcess(get_project_settings())

 def _crawl(result, spider):
    deferred = process.crawl(spider)
    deferred.addCallback(lambda results: print('waiting 100 seconds before 
    restart...'))
    deferred.addCallback(sleep, seconds=100)
    deferred.addCallback(_crawl, spider)
    return deferred


_crawl(None, MySpider)
process.start()
© www.soinside.com 2019 - 2024. All rights reserved.