parse_button 项有问题,如果值有多个选项,我的脚本无法获取下拉列表。
以下是多种组合的场景
场景1工作 尺寸 6 英尺 3 英寸
产能 3-6公斤
包装件 1 件
场景2工作 尺寸 6 英尺 10 英寸
产能 1.54公斤-已选
包装件 1 件或 2 件
场景 3 不起作用 尺寸 6 英尺 4 英寸
产能 空白
包装件 空白
Python是否可以先单击按钮“button_url”,然后提取第二级选项(如果有多个选项),然后提取第三级选项..
import scrapy
class BcfSpider(scrapy.Spider):
name = 'product_spider'
start_urls = ['https://www.bcf.com.au/p/daiwa-23-td-black-spinning-rod/M675158.html']
def parse(self, response):
for button_url in response.css('.swatchanchor::attr(href)').getall():
yield response.follow(button_url, callback=self.parse_button)
def parse_button(self, response):
dropdown_options = response.css('.variation-select option')
dropdown_data = {}
for option in dropdown_options:
value = option.css('::attr(value)').get()
text = option.css('::text').get()
dropdown_data[text] = value
yield {'dropdown_data': dropdown_data}
如果您想获得所需尺寸的所有容量,您必须提出额外的请求,例如:
import requests
from bs4 import BeautifulSoup
id_ = "M675158"
url = f"https://www.bcf.com.au/p/daiwa-23-td-black-spinning-rod/{id_}.html"
api_url = "https://www.bcf.com.au/on/demandware.store/Sites-bcf-au-Site/en_AU/Product-Variation"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select("li > a.swatchanchor"):
size = a.text.strip()
params = {
"pid": id_,
f"dwvar_{id_}_size": size,
"source": "detail",
"variant_order": "size",
"Quantity": "1",
"productlistid": "undefined",
"format": "ajax",
}
soup2 = BeautifulSoup(requests.get(api_url, params=params).content, "html.parser")
for c in soup2.select(".capacity option")[1:]:
print(f"{size:<10} {c.text.strip()}")
打印:
6ft 10-15kg
6ft 3in 3-6kg
6ft 4in 1-3kg
6ft 4in 1.5-4kg
6ft 4in 5-8kg
6ft 4in 6-12kg
6ft 10in 1.5-4kg
7ft 1-3kg
7ft 1.5-4kg
7ft 2-6kg
7ft 5-10kg
7ft 8-15kg
7ft 2in 1.5-3kg
7ft 4in 2-5kg
7ft 4in 5-8kg
7ft 4in 5-10kg
7ft 4in 10-15kg
7ft 6in 5-10kg
7ft 8in 1.5-4kg
7ft 8in 2-5kg
7ft 9in 15-24kg