如何使用Scrapy从链接列表中的每个链接获取文本信息

问题描述 投票:0回答:1

我正在尝试从每个链接中抓取一些文本。我有 600 个链接,但这段代码中只有 2 个。

usa-icon-list__content 是包含我需要的信息的类,并且该类在页面上多次使用。

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'tribe-spider'
    start_urls = [
        'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-coos',
        'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-yakama',
    ]

def parse(self, response):
        QUOTE_SELECTOR = '.usa-icon-list__content::text'
        
        for usa-icon-list__content in response.css(QUOTE_SELECTOR):
            yield {
                'text': usa-icon-list__content.css(TEXT_SELECTOR).extract_first(),
            }

我只需要为每个链接提取此信息,但不知道我是否正确使用了 Scrapy。

python web-scraping scrapy
1个回答
0
投票

您的方向是正确的,但有一些小问题。

这是修订版:

import scrapy

class TribeSpider(scrapy.Spider):
    name = 'tribe-spider'
    start_urls = [
        'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-coos',
        'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-yakama',
        # Add more URLs here if you have 600 of them
    ]

    def parse(self, response):
        TEXT_SELECTOR = '.usa-icon-list__content::text'
        
        for usa_icon_content in response.css(TEXT_SELECTOR):
            yield {
© www.soinside.com 2019 - 2024. All rights reserved.