用Python、BeautifulSoup进行网络搜刮。

问题描述 投票:0回答:1

我在用Python解析链接时遇到了一个问题。这是我的代码。

def get_content(html):
    soup = BeautifulSoup(html, 'lxml')
    items = soup.find_all('div', class_='grid-item___eaXVb')

    for item in items:
        link = item.find('a', class_='gl-product-card__details-link')
        print(link.get('href'))

我得到了这个错误。

Traceback (most recent call last):
  File "parser.py", line 32, in <module>
    parse()
  File "parser.py", line 27, in parse
    get_content(html.text)
  File "parser.py", line 21, in get_content
    print(link.get('href'))
AttributeError: 'NoneType' object has no attribute 'get'

但当我尝试这样做时

    for item in items:
        link = item.find('a', class_='gl-product-card__details-link')
        print(type(link))

我得到一个回复,所有的链接都有类型。

<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
...
...
...
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>

我哪里做错了?有什么问题吗?

python parsing web-scraping beautifulsoup lxml
1个回答
0
投票

要获得所有产品的标题和链接,你可以使用这个例子。

import requests
from bs4 import BeautifulSoup


url = 'https://www.adidas.com/us/men-shoes?price=price%3C50.0'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for a in soup.select('div[class^="product-container"] a.gl-product-card__media-link'):
    label = a.find_next(class_='gl-label')
    print('{:<50} {}'.format(label.text, 'https://www.adidas.com' + a['href']))

印刷品。

Adilette Lite Slides                               https://www.adidas.com/us/adilette-lite-slides/FU8299.html
Adilette Aqua Slides                               https://www.adidas.com/us/adilette-aqua-slides/F35550.html
U_Path Run Shoes                                   https://www.adidas.com/us/u_path-run-shoes/EE4466.html
adiease Shoes                                      https://www.adidas.com/us/adiease-shoes/BY4027.html
Nizza RF Slip-on Shoes                             https://www.adidas.com/us/nizza-rf-slip-on-shoes/EF1410.html
Adilette Slides                                    https://www.adidas.com/us/adilette-slides/280647.html
Goletto VII Turf Shoes                             https://www.adidas.com/us/goletto-vii-turf-shoes/FV8703.html
Adilette Comfort Slides                            https://www.adidas.com/us/adilette-comfort-slides/FW5337.html
Adilette Comfort Slides                            https://www.adidas.com/us/adilette-comfort-slides/FW5353.html
Adizero Spark MD Cleats                            https://www.adidas.com/us/adizero-spark-md-cleats/EF3476.html
CP Traxion Spikeless Shoes                         https://www.adidas.com/us/cp-traxion-spikeless-shoes/EE9206.html
CP Traxion Spikeless Shoes                         https://www.adidas.com/us/cp-traxion-spikeless-shoes/BB7900.html
CP Traxion Spikeless Shoes                         https://www.adidas.com/us/cp-traxion-spikeless-shoes/BD7138.html
CP Traxion Spikeless Shoes                         https://www.adidas.com/us/cp-traxion-spikeless-shoes/F34996.html
Adilette Lite Slides                               https://www.adidas.com/us/adilette-lite-slides/FU8296.html
Afterburner 6 Grail MD Cleats                      https://www.adidas.com/us/afterburner-6-grail-md-cleats/DB3106.html
Lite Racer CLN Shoes                               https://www.adidas.com/us/lite-racer-cln-shoes/EE8138.html

... and so on.
© www.soinside.com 2019 - 2024. All rights reserved.