我一直在尝试对页面进行网页抓取,但是当我想要过滤信息而不管是否 100% 匹配(大写、小写等)时,我无法让它工作。
import requests
from bs4 import BeautifulSoup
URL = "https://www.pemex.com/procura/procedimientos-de-contratacion/concursosabiertos/Paginas/Pemex-Transformación-Industrial.aspx"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="MSOZoneCell_WebPartWPQ4")
texto_licitacion = results.find_all("td", string=lambda text: "Bienes" in text.lower())
我得到了这些结果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2030, in find_all
return self._find_all(name, attrs, string, limit, generator,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 841, in _find_all
found = strainer.search(i)
^^^^^^^^^^^^^^^^^^
File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2320, in search
found = self.search_tag(markup)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2291, in search_tag
if found and self.string and not self._matches(found.string, self.string):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2352, in _matches
return match_against(markup)
^^^^^^^^^^^^^^^^^^^^^
File "<stdin>", line 2, in <lambda>
AttributeError: 'NoneType' object has no attribute 'lower'
我已经在另一个网页中尝试过并且它工作正常,但在这个网页中我不能。
某些元素没有文本,因此
text
是 None
。在你的过滤器中检查一下。
texto_licitacion = results.find_all("td", string=lambda text: text and "Bienes" in text.lower())