BeautifulSoup4 中 find_all 的问题

Question

我想获取以下网站的信息。我需要书名、代码、价格等。例如，让我们关注 ISBN 代码。我想在 html 中找到任何包含“ISBN”一词的文本。

我的代码如下：

    url_0 = 'https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1'

    result = requests.get(url)

    doc = BeautifulSoup(result.text, "html.parser")

    aux = doc.find_all(string="ISBN")

我的问题是我的结果 aux 是空的，我找不到带有 ISBN 的任何内容，但查看 html 我确实看到了这个词。

Answer 1

如评论中所述，您可以使用

re

模块来搜索字符串：

import re

import requests
from bs4 import BeautifulSoup

url = "https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1"
result = requests.get(url)

doc = BeautifulSoup(result.text, "html.parser")
aux = doc.find_all(string=re.compile("ISBN"))

print(aux)

打印：

['\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ']

但更有用的是搜索包含字符串“ISBN”的 HTML 标签：

for tag in doc.select(':-soup-contains-own("ISBN")'):
    print(tag.prettify())

打印：


...

<div class="col-12 col-md-4 text-right">
 <strong>
  <span class="price">
   € 24.95
  </span>
 </strong>
 <br/>
 Van
 <strong>
  01-10-2023
 </strong>
 t.e.m.
 <strong>
  01-04-2024
 </strong>
 <br/>
 ISBN
 <strong>
  9789090374475
 </strong>
 <br/>
</div>

...

BeautifulSoup4 中 find_all 的问题

问题描述投票：0回答：1

1个回答

最新问题

BeautifulSoup4 中 find_all 的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1