网页抓取无论大小写都找不到文本内容

问题描述 投票:0回答:1

我一直在尝试对页面进行网页抓取,但是当我想要过滤信息而不管是否 100% 匹配(大写、小写等)时,我无法让它工作。

import requests
from bs4 import BeautifulSoup
URL = "https://www.pemex.com/procura/procedimientos-de-contratacion/concursosabiertos/Paginas/Pemex-Transformación-Industrial.aspx"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="MSOZoneCell_WebPartWPQ4")


texto_licitacion = results.find_all("td", string=lambda text: "Bienes" in text.lower())

我得到了这些结果:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2030, in find_all
    return self._find_all(name, attrs, string, limit, generator,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 841, in _find_all
    found = strainer.search(i)
            ^^^^^^^^^^^^^^^^^^
  File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2320, in search
    found = self.search_tag(markup)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2291, in search_tag
    if found and self.string and not self._matches(found.string, self.string):
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\acast\AppData\Roaming\Python\Python311\site-packages\bs4\element.py", line 2352, in _matches
    return match_against(markup)
           ^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 2, in <lambda>
AttributeError: 'NoneType' object has no attribute 'lower'

我已经在另一个网页中尝试过并且它工作正常,但在这个网页中我不能。

python web-scraping findall
1个回答
0
投票

某些元素没有文本,因此

text
None
。在你的过滤器中检查一下。

texto_licitacion = results.find_all("td", string=lambda text: text and "Bienes" in text.lower())
© www.soinside.com 2019 - 2024. All rights reserved.