第一次尝试时的美丽汤追溯

问题描述 投票:0回答:2

你好,我是 python 和 Beautiful Soup 的新手。我已经使用 pip install 下载了 BS4,并尝试进行一些网页扫描。我浏览了很多帮助指南,但无法让我的 BeautifulSoup() 通过 cmd 编译器工作。这是我的代码:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))

这是我通过 URL 输入获得的回溯:

C:\Users\aaron\OneDrive\Desktop\Coding>python urllinks_get.py
Enter - http://www.dr-chuck.com/page1.htm
Traceback (most recent call last):
  File "C:\Users\aaron\OneDrive\Desktop\Coding\urllinks_get.py", line 21, in <module>
    soup = BeautifulSoup(html, 'html.parser')
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\__init__.py", line 215, in __init__
    self._feed()
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\builder\_htmlparser.py", line 164, in feed
    parser.feed(markup)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\html\parser.py", line 110, in feed
    self.goahead(0)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\html\parser.py", line 170, in goahead
    k = self.parse_starttag(i)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\html\parser.py", line 344, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\builder\_htmlparser.py", line 62, in handle_starttag
    self.soup.handle_starttag(name, None, None, attr_dict)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\__init__.py", line 404, in handle_starttag
    self.currentTag, self._most_recent_element)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 1001, in __getattr__
    return self.find(tag)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 1238, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 1259, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 516, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 1560, in __init__
    self.text = self._normalize_search_value(text)
  File "C:\Users\aaron\OneDrive\Desktop\Coding\bs4\element.py", line 1565, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'

真的很想继续我的在线课程,因此我们将不胜感激!

谢谢!

python beautifulsoup urllib traceback urlopen
2个回答
0
投票

发现我的问题。我已经安装了 beautifulsoup4 并使用了与我的程序运行在同一目录中的 bs4 文件夹。我没有意识到它们会相互干扰。一旦我从目录中删除了 bs4 文件夹,我的程序就运行良好:)


0
投票

文件“/data/data/com.termux/files/usr/lib/python3.11/site-packages/bs4/element.py”,第 528 行,位于 _find_all 过滤器 = SoupStrainer(名称、属性、文本、**kwargs)

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.