运行多个函数时出现 Scrapy 名称错误

问题描述 投票:0回答:2

我正在尝试运行以下代码,但收到此错误“NameError:名称'scrapedate'未定义”

import scrapy
from datetime import datetime, timedelta
from dogscraper.items import DogItem

racedate = '2024-01-25'
days = 2
realdate = datetime.strptime(racedate, '%Y-%m-%d').date()
scrape_list = [(realdate - timedelta(days=x)).strftime('%Y-%m-%d') for x in range(days)]

class DogspiderSpider(scrapy.Spider):
    name = "dogspider"
    allowed_domains = ["www.thedogs.com.au"]
    start_urls = ["https://www.thedogs.com.au/racing/"+racedate]

    def parse(self, response):
        for scrapedate in scrape_list:
            next_dateurl = 'https://www.thedogs.com.au/racing/' + scrapedate
            yield response.follow(next_dateurl, callback=self.parse_date)


    def parse_date(self, response):
        nswmeetings = response.css('table.meeting-grid')[0]
        nswmeetings = nswmeetings.css('td.meetings-venues__name')

        for meeting in nswmeetings:
            meeting_url = meeting.css('a::attr(href)').get()
            nextmeeting = 'https://www.thedogs.com.au' + meeting_url
            yield response.follow(nextmeeting, callback=self.parse_meeting)


    def parse_meeting(self, response):
        races = response.css('a.race-box.race-box--result')
        for race in races:
            race_url = race.css('a.race-box.race-box--result::attr(href)').get()
            nextrace = 'https://www.thedogs.com.au' + race_url
            yield response.follow(nextrace, callback=self.parse_race) 
      

    def parse_race(self, response):

        dogs = response.css('tr.accordion__anchor.race-runner')
        dog_item = DogItem()

        for dog in dogs:               

            dog_item['date'] = scrapedate

名称错误:名称“scrapedate”未定义

本质上,我想在def parse下获取scrape_list中的scrapedate,并在稍后运行def parse_race时使用它,dog_item['date'] = scrapedate

python function web-scraping scrapy web-crawler
2个回答
0
投票

观察您的代码,我可以看到您正在尝试使用 scrapedate,它是在 parse_race 函数(生成器)中的 parse 函数(生成器)中声明的。这将导致 NameError,因为 scrapedate 是特定于解析生成器的局部变量。因此,如果您想在 parse_race 中使用 scrapedate,您必须将其设为 Class 属性:

class DogspiderSpider(scrapy.Spider):
    # ... (your existing code)
    scrapedate = None  # Initialize to None

    def parse(self, response):
        for scrapedate in scrape_list:
            # ... (your existing code)
            self.scrapedate = scrapedate  # assign the attribue
            yield response.follow(next_dateurl, callback=self.parse_date)

    # ..... (Your existing code)


    def parse_race(self, response):
        # ... (your existing code)
        dog_item['date'] = self.scrapedate  # Access the attribute

0
投票

感谢@SIM。

我能够使用元传递抓取日期

    #...
yield response.follow(next_dateurl, callback=self.parse_date, meta={'scrapedate' : scrapedate})

然后

#...
yield response.follow(nextmeeting, callback=self.parse_meeting, meta={'scrapedate' : response.meta['scrapedate']})
#...

yield response.follow(nextrace, callback=self.parse_race, meta={'scrapedate' : response.meta['scrapedate']})

我可以用

来调用它
dog_item['date'] = response.meta['scrapedate']
© www.soinside.com 2019 - 2024. All rights reserved.