使用findall获取所有内容，只获取没有\ n的内容

问题描述投票：-1回答：1

我正在尝试提取结构下span标记内的内容：

<span style="font-weight:bold">xxx</span>

我从Web服务获得了一个很大的HTML代码，然后从那里我用这个结构提取span标签。

问题是，如果某个跨度的内容有\n，它就不会提取它。

例如：

print(re.findall(pattern, '<span style="font-weight:bold">AAA\n</span><span style="font-weight:bold">ooo</span>'))
>>[ooo]
#output desired should be [AAA,ooo]

我该如何解决这个问题，以便在有或没有\n的情况下提取范围的内容？

python

1个回答

3
投票

使用BeautifulSoup处理html中的元素

from bs4 import BeautifulSoup

h = """<span style="font-weight:bold">xxx</span>"""
soup = BeautifulSoup(h)
spans = soup.find_all("span")
for span in spans:
    print(span.text)

OUTPUT

u'xxx'

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.