为Scrapy时间表创建新请求

问题描述 投票:0回答:1

通过皮卡,我从rabbitmq获取网址,并尝试为Scrapy Spider创建新请求当我由scrapy crawl spider蜘蛛启动我的蜘蛛时,只是由于raise DontCloseSpider()而没有关闭,但不要创建对Spider的请求我的自定义例外:

import pika
from scrapy import signals
from scrapy.http import Request
from scrapy.exceptions import DontCloseSpider


class AddRequestExample:

    def __init__(self, stats):
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        s = cls(crawler)
        crawler.signals.connect(s.spider_idle, signal=signals.spider_idle)
        return s


    def spider_idle(self, spider):
        connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
        try:
            url = channel.basic_get(queue='hello')[2]
            url = url.decode()
            crawler.engine.crawl(Request(url), self)
        except Exception:
            pass
        raise DontCloseSpider()

我的蜘蛛:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "spider"

    def parse(self, response):
        yield {
        'url': response.url
        }
python scrapy rabbitmq pika
1个回答
0
投票

似乎您正在尝试从此answer复制方法。 在这种情况下,您需要定义请求回调函数。当您处理来自分机(而不是来自蜘蛛网)的spider_idle信号时-应该是spider.parse方法。

def spider_idle(self, spider):
    ....
    try:
        url = channel.basic_get(queue='hello')[2]
        url = url.decode()
        spider.crawler.engine.crawl(Request(url=url, callback = spider.parse), self)
    except Exception:
    ....
© www.soinside.com 2019 - 2024. All rights reserved.