如何用空间美丽的汤替换标签

问题描述 投票:1回答:3

假设我有

text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""

我想用一个空格(“”)替换一个hrefs和/ a。取而代之。 BTW它是一个BeautifulSoup.BeautifulSoup类。所以正常的.replace是行不通的。

我希望文本是公正的

""" Hello There """

注意“Hello There”之前和之后的空格。

python html html-parsing beautifulsoup
3个回答
3
投票

你可以使用replaceWith()(或replace_with()):

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<html>
 <body>
  <a href = 'http://www.crummy.com/software'>Hello There</a>
 </body>
</html>
""")

for a in soup.findAll('a'):
    a.replaceWith(" %s " % a.string)

print soup

打印:

<html><body>
 Hello There 
</body></html>

2
投票

使用.replace_with().text属性:

>>> from bs4 import BeautifulSoup as BS
>>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
>>> soup = BS(text)
>>> mytag = soup.find('a')
>>> mytag.replace_with(mytag.text + ' ')
<a href="http://www.crummy.com/software">Hello There</a>
>>> print soup
 Hello There 

-1
投票
 import re
 notag = re.sub("<.*?>", " ", html)
 >>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
 >>> notag = re.sub("<.*?>", " ", text)
 >>> notag
 '  Hello There '

看到这个答案:qazxsw poi

© www.soinside.com 2019 - 2024. All rights reserved.