为什么我的scrapy分页不使用计数器工作？

Question

代码无法运行。我正在尝试从不同品牌（类别）中抓取多个页面。我正在使用的页面有所有品牌的列表。品牌按组排列，并按该品牌的首字母进行分类。在这个品牌的页面里面，有多个页面有不同的产品。

尝试编写一个使用计数器获取品牌的代码，如果这个特定的首字母组中没有品牌，则它会转到下一个组。（请求没问题，问题出在代码中。抓取本身正在工作，只有当我尝试此分页时代码才会失败）。

import scrapy
from scrapy import Request

class MlSpider(scrapy.Spider):
    name = "ml"

    def start_requests(self):
        yield scrapy.Request('https://lista.mercadolivre.com.br/produtos-cabelo')

    def parse(self, response, **kwargs):
        cgroup = 1
        cbrand = 1
        num_group = response.xpath(f'//div[@class="ui-search-search-modal-filter-group"][{cgroup}]').get()
        for m in num_group:
            link_marca = m.xpath(f'.//a[@class="ui-search-search-modal-filter ui-search-link"][{cbrand}]/@href').get()
            if link_marca:
                yield scrapy.Request(url=link_marca)
                for i in response.xpath('.//div[@class="ui-search-result__content"]'):
                    marca = i.xpath('.//span[@class="ui-search-item__brand-discoverability ui-search-item__group__element"]/text()').get()
                    title = i.xpath('.//h2/text()').get()
                    real = i.xpath('.//span[@class="andes-money-amount ui-search-price__part ui-search-price__part--medium andes-money-amount--cents-superscript"]//span[@class="andes-money-amount__fraction"]/text()').get()
                    centavo = i.xpath('//span[@class="andes-money-amount ui-search-price__part ui-search-price__part--medium andes-money-amount--cents-superscript"]//span[@class="andes-money-amount__cents andes-money-amount__cents--superscript-24"]/text()').get()
                    value = f'R$ {real},{centavo}'
                    link = i.xpath('.//a/@href').get()

                    yield {
                        'marca': marca,
                        'title': title,
                        'value': value,
                        'link': link
                    }

                next_page = response.xpath('//a[contains(@title,"Seguinte")]/@href').get()
                if next_page:
                    yield scrapy.Request(url=next_page, callback=self.parse)

                cbrand += 1

            else:
                cgroup += 1

Answer 1

由于您放置了下一页逻辑，因此分页不起作用。我已经编辑了您的代码，使其从品牌页面开始，然后转到每个品牌，获取产品详细信息，如果有下一页，它将转到下一页并抓取该页面上的产品。我还编辑了您的一些选择器，如下所示：

import scrapy


class ProductsSpider(scrapy.Spider):
    name = "products"
    allowed_domains = ["lista.mercadolivre.com.br"]
    start_urls = [
        "https://lista.mercadolivre.com.br/produtos-cabelo_FiltersAvailableSidebar?filter=BRAND"
    ]

    def parse(self, response):
        brand_links = response.xpath("//div[@class='ui-search-search-modal-grid-columns']/a/@href").getall()

        for link in brand_links:
            yield scrapy.Request(link, callback=self.parse_products)

    def parse_products(self, response):
        for i in response.xpath('.//div[@class="ui-search-result__content"]'):
            marca = i.xpath('.//span[contains(@class, "ui-search-item__brand-discoverability")]/text()').get()
            title = i.xpath(".//h2/text()").get()
            real = i.xpath('.//span[@class="andes-money-amount__fraction"]/text()').get()
            centavo = i.xpath('.//span[contains(@class, "andes-money-amount__cents")]/text()').get()
            value = f"R$ {real},{centavo}"
            link = i.xpath(".//a/@href").get()

            yield {
                "marca": marca,
                "title": title,
                "value": value,
                "link": link,
            }

        next_page = response.xpath('//a[contains(@title,"Seguinte")]/@href').get()
        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse_products)

为什么我的scrapy分页不使用计数器工作？

问题描述投票：0回答：1

1个回答

最新问题

为什么我的scrapy分页不使用计数器工作？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1