美丽的汤不在外跨内定位内跨

Question

我正在尝试为 Udemy 课程构建一个价格跟踪器，将其作为个人项目，因为我经常查看网站上是否有我想购买的课程的销售情况。我正在尝试使用 Beautiful Soup 从 HTML 脚本中获取价格。每次我测试我的代码时，当它到达行：

price = soup.find(class_='usdr-sr-only').get_text()

时，它都会给我“NoneType对象没有属性'get_text'”（我将（）放在我的代码中的get text之后）。

这是上下文中的一行：

import requests, os, lxml
from bs4 import BeautifulSoup

UDEMY_CLASS = input("Please provide the URL for the course whose price you'd like to track: ")
url = UDEMY_CLASS

header = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9"
}
response = requests.get(url, headers=header)

soup = BeautifulSoup(response.content, "lxml")
print(soup.prettify())
        
price = soup.find(class_="usd-sr-only").get_text()
#price_without_currency = price.split("$")[1] not always needed- inspect element
price_as_float = float(price)

以及带有价格的页面部分的 HTML：

<div class="base-price-text-module--price-part---xQlz ud-clp-discount-price ud-heading-xl" data-purpose='course-price-text'> == $0
    <span class="ud-sr-only">Current price</span>
        <span>$13.99</span>

我第一反应是告诉BS找错了班级。我应该如何将跨度与价格文本隔离？感谢您的任何意见，如果我需要添加信息，请告诉我。

Answer 1

由于请求被 Udemy 阻止，您可以使用 Selenium 代替请求。您可以替换这部分代码：

response = requests.get(url, headers=header)
soup = BeautifulSoup(response.content, "lxml")

这个：

browser = webdriver.Chrome(executable_path=chrome_driver_path)
browser.get(url)
soup = BeautifulSoup(browser.page_source, "lxml")

如果您想以无头模式运行脚本（其中 chrome 驱动程序不会以静默方式打开/运行），请使用此部分：

options = Options()
options.add_argument('--headless')
browser = webdriver.Chrome(executable_path=chrome_driver_path, options=options)
browser.get(url)
soup = BeautifulSoup(browser.page_source, "lxml")

美丽的汤不在外跨内定位内跨

问题描述投票：0回答：1

1个回答

最新问题

美丽的汤不在外跨内定位内跨

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1