坚持从网页中提取标题和下页网址

问题描述 投票:1回答:1

我试图提取这些搜索结果以及搜索结果的翻页每个RV单位详细页面的链接,所以我可以链接到他们在那里site每rv单元

import scrapy

class cwscrape(scrapy.Spider):
    name = 'rvlinks'

  start_urls = ['https://rv.campingworld.com/searchresults?condition=new_used&custompricerange=true&custompaymentrange=true&sort=featured_asc&zipsearch=true&search_mode=advanced&locations=nationwide']

  def parse(self, response):
      for rvname in response.xpath("//div[@class='title']"):
          yield{ 'rv_full_name': rvname.xpath(".//span[@itemprop='name']/text()").extract_first()}

      next_page= response.xpath(".//div[@class='pagination-wrap']/a/@href").extract_first()
      if next_page is not None:
          next_page_link= response.urljoin(next_page)
          yield scrapy.Request(url=next_page_link, callback=self.parse)

示例URL为每个细节单元将是: https://rv.campingworld.com/rvdetails/new-class-c-rvs/2019-thor-freedom-elite-26he-front-living-60k-BKY1571461

下页网址是: https://rv.campingworld.com/searchresults?condition=new_used&custompricerange=true&custompaymentrange=true&sort=featured_asc&zipsearch=true&search_mode=advanced&locations=nationwide&scpc=&make=&landingMake=0&page=2

python scrapy scrapy-spider
1个回答
0
投票

我试过你的代码中scrapy shell,一切看起来不错:

In [5]: response.xpath("//div[@class='title']//span[@itemprop='name']/text()").extract()
Out[5]: 
[u'2019 THOR FREEDOM ELITE 22HEC',
 u'2018 THOR GEMINI 23TR',
 u'2018 THOR GEMINI 23TK',
 u'2019 THOR FREEDOM ELITE 24HE',
 u'2019 WINNEBAGO MINNIE WINNIE 22R',
 u'2019 WINNEBAGO MINNIE WINNIE 22M',
 u'2019 WINNEBAGO OUTLOOK 27D',
 u'2019 THOR FREEDOM ELITE 28FE',
 u'2019 WINNEBAGO MINNIE WINNIE 25B',
 u'2019 THOR FREEDOM ELITE 28FE',
 u'2019 WINNEBAGO OUTLOOK 31N',
 u'2019 THOR QUANTUM RC25',
 u'2018 THOR SYNERGY JR24',
 u'2019 WINNEBAGO MINNIE WINNIE 26A',
 u'2019 THOR QUANTUM KM24',
 u'2019 WINNEBAGO MINNIE WINNIE 31G',
 u'2019 THOR SYNERGY 24SJ',
 u'2019 WINNEBAGO VIEW 24G',
 u'2019 WINNEBAGO VIEW 24V',
 u'2019 WINNEBAGO OUTLOOK 22E']

In [6]: response.xpath(".//div[@class='pagination-wrap']/a/@href").get()
Out[6]: u'https://rv.campingworld.com/searchresults?condition=new_used&custompricerange=true&custompaymentrange=true&sort=featured_asc&zipsearch=true&search_mode=advanced&locations=nationwide&scpc=&make=&landingMake=0&page=1'

你遇到了什么样的问题?

© www.soinside.com 2019 - 2024. All rights reserved.