Scrapy循环查找每个元素并导出到Json

问题描述 投票:0回答:1

下面的代码找到了我正在寻找的所有元素,但我很难将这是一个循环,其中数据被放入数据帧并导出到 Json 文件。所有命令都从命令行运行以引入我需要的数据。我怎样才能让他导出json?

import scrapy
from scrapy.item import Field, Item
from scrapy.selector import Selector


class WeatherdataSpider(scrapy.Spider):
    name = 'weatherdata'
    allowed_domains = ['https://www.nflweather.com/']
    start_urls = ['http://https://www.nflweather.com//']

    def parse(self, response):
        #pass
        trans_table = {ord(c): None for c in u'\n\t\t'}
        Datetime = '    '.join(s.strip().translate(trans_table) for s in response.xpath('//div[@class="fw-bold text-wrap"]/text()').extract())
        awayTeam = response.xpath('//span[@class="fw-bold"]/text()').extract()
        homeTeam = response.xpath('//span[@class="fw-bold ms-1"]/text()').extract()
        TempProb    = response.xpath('//div[@class="mx-2"]/span/text()').extract()
        windspeed = response.xpath('//div[@class="text-break col-md-4 mb-1 px-1 flex-centered"]/span/text()').extract()
python scrapy
1个回答
0
投票

试试这个:

import re
import scrapy


class WeatherdataSpider(scrapy.Spider):
    name = 'weatherdata'
    allowed_domains = ['www.nflweather.com']
    start_urls = ['https://www.nflweather.com/']

    def parse(self, response):
        trans_table = {ord(c): None for c in u'\n\t\t'}

        date = response.xpath('//div[@class="fw-bold text-wrap"]/text()').extract()
        date = [d.strip().translate(trans_table) for d in date]

        wind = response.xpath('//div[contains(@class, "text-break")]/span[not(ancestor::span)][2]/text()[1]').extract()
        wind = [re.sub(" \xa0", "", w) for w in wind]

        away = response.xpath('//span[@class="fw-bold"]/text()').extract()
        home = response.xpath('//span[@class="fw-bold ms-1"]/text()').extract()
        temp  = response.xpath('//div[@class="mx-2"][1]/span/text()').extract()
    
        for date, home, away, temp, wind in zip(date, home, away, temp, wind):
            yield {
                "date": date,
                "home": home,
                "away": away,
                "temp": temp,
                "wind": wind,
            }

我理解为什么您想要使用数据框,但使用这种方法就没有必要了。您将页面上的所有项目刮到单独的列表中,然后并行迭代它们,一次生成一个项目。

像这样运行它:

scrapy crawl weatherdata -O weather.json

这会将数据转储到 JSON 文件。

[
  {
    "date": "01/20/24 04:30 PM EST",
    "home": "Ravens",
    "away": "Texans",
    "temp": "27 °F",
    "wind": "16 mph"
  },
  {
    "date": "01/20/24 08:15 PM EST",
    "home": "49ers",
    "away": "Packers",
    "temp": "58 °F",
    "wind": "5 mph"
  },
  {
    "date": "01/21/24 03:00 PM EST",
    "home": "Lions",
    "away": "Buccaneers",
    "temp": "23 °F",
    "wind": "10 mph"
  },
  {
    "date": "01/21/24 06:30 PM EST",
    "home": "Bills",
    "away": "Chiefs",
    "temp": "18 °F",
    "wind": "10 mph"
  }
]

我通过

jq
传输 JSON,以使其更易于阅读。

jq . weather.json
© www.soinside.com 2019 - 2024. All rights reserved.