如何在不使用 Selenium 的情况下使用 Python 处理百思买登陆页面上的国家/地区选择？

Question

我正在尝试使用 Python 从百思买网站获取内容，但我在国家/地区选择页面上遇到了最初的障碍。首次访问百思买时，该网站要求用户选择一个国家/地区，这似乎是通过 JavaScript 管理的。我想自动通过此页面来访问网站的主要内容。

我目前正在使用 BeautifulSoup 进行抓取，但我知道它不处理 JavaScript。如果可能的话，我想避免使用 Selenium 或其他浏览器自动化工具。

有没有办法使用 Selenium 以外的库来使用 Python 模拟国家/地区选择，例如通过直接 HTTP 请求？

任何绕过或模拟国家/地区选择的指导或替代建议将不胜感激！

我的代码片段：

def scrape_bestbuy(product_name):
    url = f"https://www.bestbuy.com/site/searchpage.jsp?st={product_name.replace(' ', '+')}"
    response = requests.get(url, headers=get_random_user_agent())
    soup = BeautifulSoup(response.text, 'html.parser')
    try:
        product = soup.select_one('.sku-title a').text.strip()
        price = soup.select_one(".pricing-price div[data-testid='large-price'] .priceView-customer-price > span:nth-child(1)").text
        return {'Site': 'Bestbuy.com', 'Item title name': product, 'Price(USD)': price}
    except AttributeError:
        return {'Site': 'Bestbuy.com', 'Item title name': 'No Product Found', 'Price(USD)': 'N/A'}

Answer 1

当您在浏览器中选择 USA 时，您会看到它在 URl 中添加了

&intl=nosplash

，因此正在运行

from bs4 import BeautifulSoup
import requests 

product_name = "vacuum"

url = f"https://www.bestbuy.com/site/searchpage.jsp?st={product_name.replace(' ', '+')}&intl=nosplash"
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})

soup = BeautifulSoup(response.text, 'html.parser')

product = soup.select_one('.sku-title a').text.strip()
price = soup.select_one(".pricing-price div[data-testid='large-price'] .priceView-customer-price > span:nth-child(1)").text

print(price)

实际上似乎可以工作并打印

'$159.99'

如何在不使用 Selenium 的情况下使用 Python 处理百思买登陆页面上的国家/地区选择？

问题描述投票：0回答：1

1个回答

最新问题

如何在不使用 Selenium 的情况下使用 Python 处理百思买登陆页面上的国家/地区选择？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1