NoneType不可迭代,使用Python 3.8进行网络抓取时引起的错误

问题描述 投票:0回答:1

目前,我被分配去制作可刮除链接的网络抓取工具。我可以成功提取此数据:

/
/users/sign_up
/topics
/smarties
/posts
/users/sign_in
/users/sign_up
/posts/installing-anaconda-python-data-science-platform
/topics/python
/topics/anaconda-python
/topics/machine-learning
/jordan
/posts/python-libraries-to-import-for-data-science-programs
/topics/python
/topics/data-science
/topics/machine-learning
/jordan
/posts/shortcut-for-opening-the-object-inspector-in-python-spyder
/topics/python
/topics/anaconda-python
/topics/spyder-python
/topics/machine-learning
/jordan
/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm
/topics/machine-learning
/topics/python
/jordan
/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays
/topics/python
/jordan
/posts/how-to-implement-fizzbuzz-in-python
/topics/fizzbuzz
/topics/python
/jordan
/posts/how-to-think-like-a-computer-scientist
/topics/computer-science
/topics/python
/topics/programming
/jordan
/posts/base-case-example-for-how-to-test-a-python-class
/topics/python
/topics/tdd
/jordan
/posts/installing-and-working-with-pipenv
/topics/pipenv
/topics/python
/jordan
/posts/steps-for-building-a-flask-api-application-with-python-3
/topics/flask
/topics/tutorial
/topics/python
/jordan
None
/topics/python?page=2
/topics/python?page=3
/topics/python?page=4
/topics/python?page=2
/topics/python?page=4

运行此代码后

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('http://www.dailysmarty.com/topics/python')

soup = bs(r.text, 'html.parser')

for link in soup.find_all('a'):
    print(link.get('href'))

但是当我运行这个正在生成的发电机时:

def generator(web):
    titles = []
    for link in web:
        if 'posts' in link.get('href'):
            print(link.get('href'))
        else:
            pass


data = soup.find_all('a')
#generator(data)

我得到此数据和这些回调错误:

/posts
/posts/installing-anaconda-python-data-science-platform
/posts/python-libraries-to-import-for-data-science-programs
/posts/shortcut-for-opening-the-object-inspector-in-python-spyder
/posts/python-script-for-replacing-missing-data-in-a-machine-learning-algorithm
/posts/python-script-for-pulling-in-the-same-column-from-an-array-of-arrays
/posts/how-to-implement-fizzbuzz-in-python
/posts/how-to-think-like-a-computer-scientist
/posts/base-case-example-for-how-to-test-a-python-class
/posts/installing-and-working-with-pipenv
/posts/steps-for-building-a-flask-api-application-with-python-3
Traceback (most recent call last):
  File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 18, in <module>
    generator(data)
  File "C:\Users\joshu\AppData\Local\Programs\Python\Python38\classes.py", line 13, in generator
    if 'posts' in link.get('href'):
TypeError: argument of type 'NoneType' is not iterable

我如何做到这一点,以便在运行生成器时,可以在for循环中通过None而不导致代码中出现错误?

python html python-3.x web-scraping href
1个回答
0
投票

您必须检查链接是否确实具有"href"属性:

if link.has_attr('href') and 'posts' in link.get('href'):
© www.soinside.com 2019 - 2024. All rights reserved.