beautifulsoup 相关问题

我正在抓取 HTML 页面，我想将所有包含字符串“is”的存储在列表中。然而，代码只存储前两个，我不知道我到底是什么我正在抓取 HTML 页面，我想将所有包含字符串“is”的 <li> 存储在列表中。然而，代码只存储前两个，我不知道我到底做错了什么我正在使用 BeautifulSoup 进行抓取 <ul class="fun-facts"> <li>Owned my dream car in high school <a href="#footer">1</a></li> <li>Middle name is Ronald</li> <li>Never had been on a plane until college</li> <li>Dunkin Donuts coffee is better than Starbucks</li> <li>A favorite book series of mine is Ender's Game</li> <li>Current video game of choice is Rocket League</li> <li>The band that I've seen the most times live is the Zac Brown Band</li> </ul> 我的代码 import re fun_facts = webpage.find('ul', attrs={'class', 'fun-facts'}) fun_facts_with_is = fun_facts.find_all('li', string=re.compile("is")) fun_facts_with_is 结果返回 [<li>Middle name is Ronald</li>, <li>Dunkin Donuts coffee is better than Starbucks</li>] 我正在寻找的结果： ['Middle name is Ronald', 'Dunkin Donuts coffee is better than Starbucks', "A favorite book series of mine is Ender's Game", 'Current video game of choice is Rocket League', "The band that I've seen the most times live is the Zac Brown Band"] 该行为在 beautifulsoup 文档中进行了说明：如果一个标签包含多个内容，那么就不清楚 .string 应该指代什么，所以 .string 被定义为 None 在您的情况下，thing指的是<li>内的其他标签。要获得结果，您可以修改代码： import re from bs4 import BeautifulSoup html_source = """\ <ul class="fun-facts"> <li>Owned my dream car in high school <a href="#footer">1</a></li> <li>Middle name is Ronald</li> <li>Never had been on a plane until college</li> <li>Dunkin Donuts coffee is better than Starbucks</li> <li>A favorite book series of mine is Ender's Game</li> <li>Current video game of choice is Rocket League</li> <li>The band that I've seen the most times live is the Zac Brown Band</li> </ul>""" webpage = BeautifulSoup(html_source, "html.parser") fun_facts = webpage.find("ul", attrs={"class", "fun-facts"}) out = [] for li in fun_facts.find_all("li"): if "is" in li.text: out.append(li.text) print(out) 打印： [ "Middle name is Ronald", "Dunkin Donuts coffee is better than Starbucks", "A favorite book series of mine is Ender's Game", "Current video game of choice is Rocket League", "The band that I've seen the most times live is the Zac Brown Band", ]

python beautifulsoup python-re

回答 1 投票 0

解析讨论论坛只能让我获得第一个用户评论，但不能获得其他用户回复

有人可以帮我一下吗，我似乎无法弄清楚这个问题。我有一个 url 文件列表，如下所示： https://community.appian.com/discussions/f/administration/14/integrate-token-d...

python web-scraping beautifulsoup

回答 1 投票 0

抓取MDPI提取邮件地址

我有以下代码，应该从 MDPI（医学数据库）打开一个网页，并提取 20 篇文章的列表。然后，转到每篇文章的 URL，并提取找到的第一封电子邮件...

python beautifulsoup

回答 1 投票 0

dict.has_key(somekey) 与 dict 中的 somekey 的作用不同

我最近在玩Beautifulsoup时遇到了一个奇怪的Python字典问题。我的代码看起来像这样。导入 urllib2 从 BeautifulSoup 导入 BeautifulSoup 响应 = urllib2.