我正在抓取 HTML 页面,我想将所有包含字符串“is”的
<li>
存储在列表中。然而,代码只存储前两个,我不知道我到底做错了什么
我正在使用 BeautifulSoup 进行抓取
<ul class="fun-facts">
<li>Owned my dream car in high school <a href="#footer"><sup>1</sup></a></li>
<li>Middle name is Ronald</li>
<li>Never had been on a plane until college</li>
<li>Dunkin Donuts coffee is better than Starbucks</li>
<li>A favorite book series of mine is <i>Ender's Game</i></li>
<li>Current video game of choice is <i>Rocket League</i></li>
<li>The band that I've seen the most times live is the <i>Zac Brown Band</i></li>
</ul>
我的代码
import re
fun_facts = webpage.find('ul', attrs={'class', 'fun-facts'})
fun_facts_with_is = fun_facts.find_all('li', string=re.compile("is"))
fun_facts_with_is
结果返回
[<li>Middle name is Ronald</li>,
<li>Dunkin Donuts coffee is better than Starbucks</li>]
我正在寻找的结果:
['Middle name is Ronald',
'Dunkin Donuts coffee is better than Starbucks',
"A favorite book series of mine is Ender's Game",
'Current video game of choice is Rocket League',
"The band that I've seen the most times live is the Zac Brown Band"]
该行为在 beautifulsoup 文档中进行了说明:
如果一个标签包含多个内容,那么就不清楚 .string 应该指代什么,所以 .string 被定义为 None
在您的情况下,thing指的是
<li>
内的其他标签。
要获得结果,您可以修改代码:
import re
from bs4 import BeautifulSoup
html_source = """\
<ul class="fun-facts">
<li>Owned my dream car in high school <a href="#footer"><sup>1</sup></a></li>
<li>Middle name is Ronald</li>
<li>Never had been on a plane until college</li>
<li>Dunkin Donuts coffee is better than Starbucks</li>
<li>A favorite book series of mine is <i>Ender's Game</i></li>
<li>Current video game of choice is <i>Rocket League</i></li>
<li>The band that I've seen the most times live is the <i>Zac Brown Band</i></li>
</ul>"""
webpage = BeautifulSoup(html_source, "html.parser")
fun_facts = webpage.find("ul", attrs={"class", "fun-facts"})
out = []
for li in fun_facts.find_all("li"):
if "is" in li.text:
out.append(li.text)
print(out)
打印:
[
"Middle name is Ronald",
"Dunkin Donuts coffee is better than Starbucks",
"A favorite book series of mine is Ender's Game",
"Current video game of choice is Rocket League",
"The band that I've seen the most times live is the Zac Brown Band",
]