AttributeError:“NoneType”对象没有属性“get_text”HTML 报废

问题描述 投票:0回答:1

我正在尝试使用亚马逊网站进行第一次网络抓取,但我遇到了一个无法解决的错误。这是我试图执行的代码:

# import libaries
from bs4 import BeautifulSoup
import requests


# Connect to website 

URL = 'https://www.amazon.com/Hanes-Short-Sleeve-Beefy-T-Smoke/dp/B01KNM3EAM/ref=sr_1_4?crid=5ZT2DA0EILMA&keywords=t+shirts+for+man&qid=1692275014&sprefix=t+s%2Caps%2C422&sr=8-4'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-64de10e0-49a614555bcfb2fc5d15488a"}

page = requests.get(URL, headers=headers)

soup1 = BeautifulSoup(page.content, "lxml")

soup2 = BeautifulSoup(soup1.prettify(), "lxml")

title = soup2.find(id="productTitle").get_text()

print(title)

代码返回错误 - 非类型对象没有属性“get_text” - 如果在没有 get_text 属性的情况下运行它,则返回:None。

这是输出:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [42], in <cell line: 14>()
     10 soup1 = BeautifulSoup(page.content, "lxml")
     12 soup2 = BeautifulSoup(soup1.prettify(), "lxml")
---> 14 title = soup2.find(id="productTitle").get_text()
     16 print(title)

AttributeError: 'NoneType' object has no attribute 'get_text'

PS。 ProductTitle id 确实存在并且内部有文本:

<span id="productTitle" class="a-size-large product-title-word-break">  "Hanes Mens Beefyt T-Shirt, Classic Heavyweight Cotton Crewneck Tee, Roomy Fit, 1 Or 2 Pack, Available in Tall" </span>

我在这个网站上看到了这个问题的答案,但都没有帮助。预先感谢。

python web-scraping attributeerror
1个回答
0
投票

在使用请求库以编程方式进行抓取时,他们可以轻松识别并阻止此类活动。然而,使用 Selenium(一种 Web 自动化工具),我们可以完成此任务。

尝试这个代码:)

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

URL = 'https://www.amazon.com/Hanes-Short-Sleeve-Beefy-T-Smoke/dp/B01KNM3EAM/ref=sr_1_4?crid=5ZT2DA0EILMA&keywords=t+shirts+for+man&qid=1692275014&sprefix=t+s%2Caps%2C422&sr=8-4'
chrome_driver_path = '/usr/bin/chromedriver'

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(executable_path=chrome_driver_path, options=chrome_options)
driver.get(URL)

product_title_element = driver.find_element_by_xpath('//*[@id="productTitle"]')
product_title = product_title_element.text

print(product_title) # Output: Hanes Mens Beefyt T-Shirt, Classic Heavyweight Cotton Crewneck Tee, Roomy Fit, 1 Or 2 Pack, Available in Tall


driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.