代码无法运行。我正在尝试从不同品牌(类别)中抓取多个页面。我正在使用的页面有所有品牌的列表。品牌按组排列,并按该品牌的首字母进行分类。在这个品牌的页面里面,有多个页面有不同的产品。
尝试编写一个使用计数器获取品牌的代码,如果这个特定的首字母组中没有品牌,则它会转到下一个组。 (请求没问题,问题出在代码中。抓取本身正在工作,只有当我尝试此分页时代码才会失败)。
import scrapy
from scrapy import Request
class MlSpider(scrapy.Spider):
name = "ml"
def start_requests(self):
yield scrapy.Request('https://lista.mercadolivre.com.br/produtos-cabelo')
def parse(self, response, **kwargs):
cgroup = 1
cbrand = 1
num_group = response.xpath(f'//div[@class="ui-search-search-modal-filter-group"][{cgroup}]').get()
for m in num_group:
link_marca = m.xpath(f'.//a[@class="ui-search-search-modal-filter ui-search-link"][{cbrand}]/@href').get()
if link_marca:
yield scrapy.Request(url=link_marca)
for i in response.xpath('.//div[@class="ui-search-result__content"]'):
marca = i.xpath('.//span[@class="ui-search-item__brand-discoverability ui-search-item__group__element"]/text()').get()
title = i.xpath('.//h2/text()').get()
real = i.xpath('.//span[@class="andes-money-amount ui-search-price__part ui-search-price__part--medium andes-money-amount--cents-superscript"]//span[@class="andes-money-amount__fraction"]/text()').get()
centavo = i.xpath('//span[@class="andes-money-amount ui-search-price__part ui-search-price__part--medium andes-money-amount--cents-superscript"]//span[@class="andes-money-amount__cents andes-money-amount__cents--superscript-24"]/text()').get()
value = f'R$ {real},{centavo}'
link = i.xpath('.//a/@href').get()
yield {
'marca': marca,
'title': title,
'value': value,
'link': link
}
next_page = response.xpath('//a[contains(@title,"Seguinte")]/@href').get()
if next_page:
yield scrapy.Request(url=next_page, callback=self.parse)
cbrand += 1
else:
cgroup += 1
由于您放置了下一页逻辑,因此分页不起作用。我已经编辑了您的代码,使其从品牌页面开始,然后转到每个品牌,获取产品详细信息,如果有下一页,它将转到下一页并抓取该页面上的产品。我还编辑了您的一些选择器,如下所示:
import scrapy
class ProductsSpider(scrapy.Spider):
name = "products"
allowed_domains = ["lista.mercadolivre.com.br"]
start_urls = [
"https://lista.mercadolivre.com.br/produtos-cabelo_FiltersAvailableSidebar?filter=BRAND"
]
def parse(self, response):
brand_links = response.xpath("//div[@class='ui-search-search-modal-grid-columns']/a/@href").getall()
for link in brand_links:
yield scrapy.Request(link, callback=self.parse_products)
def parse_products(self, response):
for i in response.xpath('.//div[@class="ui-search-result__content"]'):
marca = i.xpath('.//span[contains(@class, "ui-search-item__brand-discoverability")]/text()').get()
title = i.xpath(".//h2/text()").get()
real = i.xpath('.//span[@class="andes-money-amount__fraction"]/text()').get()
centavo = i.xpath('.//span[contains(@class, "andes-money-amount__cents")]/text()').get()
value = f"R$ {real},{centavo}"
link = i.xpath(".//a/@href").get()
yield {
"marca": marca,
"title": title,
"value": value,
"link": link,
}
next_page = response.xpath('//a[contains(@title,"Seguinte")]/@href').get()
if next_page:
yield scrapy.Request(url=next_page, callback=self.parse_products)