如何删除空白带有美丽汤4的标签

问题描述投票：0回答：1

我有一个生成的html，在某些地方它有<p><br/></p>或<p>\n\t</p>，<p><strong></strong><strong></strong></p>，如空标记。我想删除它们。

for tag in soup("strong"):
    if len(tag.get_text(strip=True)) == 0:
         print(tag)
for tag in soup("p"):
    if len(tag.get_text(strip=True)) == 0:
        print(tag)

但这会发现包装在p标签内的img标签，因为它不被视为标签的内部文本。任何帮助表示赞赏。

python html beautifulsoup

1个回答

0
投票

以下是删除空标签的两种方法；

for x in soup.find_all():
if len(x.get_text(strip=True)) == 0:
    x.extract()

第二种方式；

 [x.decompose() for x in soup.findAll(lambda tag: not tag.contents and not tag.name == 'br' )]

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.