我正在尝试从每个链接中抓取一些文本。我有 600 个链接,但这段代码中只有 2 个。
usa-icon-list__content 是包含我需要的信息的类,并且该类在页面上多次使用。
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'tribe-spider'
start_urls = [
'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-coos',
'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-yakama',
]
def parse(self, response):
QUOTE_SELECTOR = '.usa-icon-list__content::text'
for usa-icon-list__content in response.css(QUOTE_SELECTOR):
yield {
'text': usa-icon-list__content.css(TEXT_SELECTOR).extract_first(),
}
我只需要为每个链接提取此信息,但不知道我是否正确使用了 Scrapy。
您的方向是正确的,但有一些小问题。
这是修订版:
import scrapy
class TribeSpider(scrapy.Spider):
name = 'tribe-spider'
start_urls = [
'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-coos',
'https://www.bia.gov/bia/ois/tribal-leaders-directory/tribes/confederated-yakama',
# Add more URLs here if you have 600 of them
]
def parse(self, response):
TEXT_SELECTOR = '.usa-icon-list__content::text'
for usa_icon_content in response.css(TEXT_SELECTOR):
yield {