下面的代码找到了我正在寻找的所有元素,但我很难将这是一个循环,其中数据被放入数据帧并导出到 Json 文件。所有命令都从命令行运行以引入我需要的数据。我怎样才能让他导出json?
import scrapy
from scrapy.item import Field, Item
from scrapy.selector import Selector
class WeatherdataSpider(scrapy.Spider):
name = 'weatherdata'
allowed_domains = ['https://www.nflweather.com/']
start_urls = ['http://https://www.nflweather.com//']
def parse(self, response):
#pass
trans_table = {ord(c): None for c in u'\n\t\t'}
Datetime = ' '.join(s.strip().translate(trans_table) for s in response.xpath('//div[@class="fw-bold text-wrap"]/text()').extract())
awayTeam = response.xpath('//span[@class="fw-bold"]/text()').extract()
homeTeam = response.xpath('//span[@class="fw-bold ms-1"]/text()').extract()
TempProb = response.xpath('//div[@class="mx-2"]/span/text()').extract()
windspeed = response.xpath('//div[@class="text-break col-md-4 mb-1 px-1 flex-centered"]/span/text()').extract()
试试这个:
import re
import scrapy
class WeatherdataSpider(scrapy.Spider):
name = 'weatherdata'
allowed_domains = ['www.nflweather.com']
start_urls = ['https://www.nflweather.com/']
def parse(self, response):
trans_table = {ord(c): None for c in u'\n\t\t'}
date = response.xpath('//div[@class="fw-bold text-wrap"]/text()').extract()
date = [d.strip().translate(trans_table) for d in date]
wind = response.xpath('//div[contains(@class, "text-break")]/span[not(ancestor::span)][2]/text()[1]').extract()
wind = [re.sub(" \xa0", "", w) for w in wind]
away = response.xpath('//span[@class="fw-bold"]/text()').extract()
home = response.xpath('//span[@class="fw-bold ms-1"]/text()').extract()
temp = response.xpath('//div[@class="mx-2"][1]/span/text()').extract()
for date, home, away, temp, wind in zip(date, home, away, temp, wind):
yield {
"date": date,
"home": home,
"away": away,
"temp": temp,
"wind": wind,
}
我理解为什么您想要使用数据框,但使用这种方法就没有必要了。您将页面上的所有项目刮到单独的列表中,然后并行迭代它们,一次生成一个项目。
像这样运行它:
scrapy crawl weatherdata -O weather.json
这会将数据转储到 JSON 文件。
[
{
"date": "01/20/24 04:30 PM EST",
"home": "Ravens",
"away": "Texans",
"temp": "27 °F",
"wind": "16 mph"
},
{
"date": "01/20/24 08:15 PM EST",
"home": "49ers",
"away": "Packers",
"temp": "58 °F",
"wind": "5 mph"
},
{
"date": "01/21/24 03:00 PM EST",
"home": "Lions",
"away": "Buccaneers",
"temp": "23 °F",
"wind": "10 mph"
},
{
"date": "01/21/24 06:30 PM EST",
"home": "Bills",
"away": "Chiefs",
"temp": "18 °F",
"wind": "10 mph"
}
]
我通过
jq
传输 JSON,以使其更易于阅读。
jq . weather.json