使用BeautifulSoup在DIV类内的H标签中查找部分文本

Question

我在DIV类内容中有HTML外观

<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>

我的Python代码是

soup.find('div',attrs={'class':'content'}).h2.text

它返回

Brookstone
                         AS20194 Multi-functional Massage Chair

我应该如何更新代码以使其返回

AS20194 Multi-functional Massage Chair

Answer 1

不是真正的美丽汤大师，但是我看到的是它正确地返回了代码的“文本”部分。如果可以选择未格式化的内容，则可以尝试在bs4文档中查看。

Answer 2

您可以使用extract()忽略strong标签。您可以尝试：

import requests
from bs4 import BeautifulSoup
import re

html_doc="""<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>"""

soup = BeautifulSoup(html_doc, 'lxml')

for strong in soup.find("strong"):
    strong.extract()
print(soup.text)

输出将是：

AS20194 Multi-functional Massage Chair

Answer 3

无需执行.extract()，可以将.find_next_sibling()与参数text=True一起使用：

from bs4 import BeautifulSoup


txt = '''<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>'''

soup = BeautifulSoup(txt, 'html.parser')

print(soup.h2.strong.find_next_sibling(text=True))

打印：

 AS20194 Multi-functional Massage Chair

使用BeautifulSoup在DIV类内的H标签中查找部分文本

问题描述投票：0回答：3

3个回答

最新问题

使用BeautifulSoup在DIV类内的H标签中查找部分文本

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3