我对我的代码有疑问,因为它不起作用。 Visual Studio 在类别变量上显示 AttributeError - Traceback(最近一次调用): 文件“”,第 11 行,位于 AttributeError:“NoneType”对象没有属性“find_all”
我无法弄清楚,问题出在哪里。我被困住了,所以如果有人知道我在哪里犯了错误,我会很高兴。这是代码:
from bs4 import BeautifulSoup
import csv
import pandas as pd
url = 'https://books.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
books_data = []
for page_num in range(1,51):
url = f'https://books.toscrape.com/catalogue/page-{page_num}.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
books = soup.find_all('h3')
for book in books:
book_url = book.find('a')['href']
book_response = requests.get(url + book_url)
book_soup = BeautifulSoup(book_response.content, 'html.parser')
title = book_soup.find('h1').text
category = book_soup.find('ul', class_ = 'breadcrumb').find_all('a')[2].text.strip()
rating = book_soup.find('p', class_ = 'star-rating')['class'][1]
price = book_soup.find('p', class_ = 'price_color').text.strip()
availibility = book_soup.find('p', class_ = 'availibility').text.strip()
books_data = ([title, category, rating, price, availibility])
print(books_data)
book_url
阵型存在问题。您正在组合分页页面 url 和图书 href 值,因此最终的 book_url
是无效的 URL。
book_response
仅包含带有h1标签的404 Not Found
。还有一些其他问题,例如未将数据附加到 books_data
。
检查下面修改后的代码
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
books_data = []
for page_num in range(1,51):
page_url = f'https://books.toscrape.com/catalogue/page-{page_num}.html'
response = requests.get(page_url)
soup = BeautifulSoup(response.content, 'html.parser')
books = soup.find_all('h3')
for book in books:
book_href = book.find('a')['href']
book_url = f"https://books.toscrape.com/catalogue/{book_href}"
book_response = requests.get(book_url)
book_soup = BeautifulSoup(book_response.content, 'html.parser')
title = book_soup.find('h1').text
category = book_soup.find('ul', class_ = 'breadcrumb').find_all('a')[2].text.strip()
rating = book_soup.find('p', class_ = 'star-rating')['class'][1]
price = book_soup.find('p', class_ = 'price_color').text.strip()
availibility = book_soup.find_all('p', class_ = ['instock', 'availibility'])[0].text.strip()
data = [title, category, rating, price, availibility]
books_data.append(data)
print(books_data)
输出:
[['A Light in the Attic', 'Poetry', 'Three', '£51.77', 'In stock (22 available)'],
['Tipping the Velvet', 'Historical Fiction', 'One', '£53.74', 'In stock (20 available)'],
['Soumission', 'Fiction', 'One', '£50.10', 'In stock (20 available)'],
['Sharp Objects', 'Mystery', 'Four', '£47.82', 'In stock (20 available)'],
.
.
.
['Mesaerion: The Best Science Fiction Stories 1800-1849', 'Science Fiction', 'One', '£37.59', 'In stock (19 available)'],
['Libertarianism for Beginners', 'Politics', 'Two', '£51.33', 'In stock (19 available)'],
["It's Only the Himalayas", 'Travel', 'Two', '£45.17', 'In stock (19 available)']]