BeautifulSoup获取<>标签的内容

问题描述 投票:-1回答:1

我有一组必须使用的刮取页面(不能再次刮擦这些页面),其中包含用引号引起来的元信息\&\ lt; \&\ gt;像这样的标签:

 ...
 <span class="html-tag">
 &lt;meta <span class="html-attribute-name">name</span>="
 <span class="html-attribute-value">twitter:title</span>" 
 <span class="html-attribute-name">property</span>="
 <span class="html-attribute-value">og:title</span>" 
 <span class="html-attribute-name">content</span>="
 <span class="html-attribute-value">Smart TV wifi won't turn on</span>" /&gt;
 ...

我的文件中有数百个不同的'html-attribute-value'。是否可以使用BeautifulSoup从带引号的元标记中获取内容?像在这种情况下,我需要获取“智能电视wifi无法打开”的信息。怎么做?

beautifulsoup meta-tags
1个回答
0
投票
from bs4 import BeautifulSoup


html = """ ...
 <span class="html-tag">
 &lt;meta <span class="html-attribute-name">name</span>="
 <span class="html-attribute-value">twitter:title</span>" 
 <span class="html-attribute-name">property</span>="
 <span class="html-attribute-value">og:title</span>" 
 <span class="html-attribute-name">content</span>="
 <span class="html-attribute-value">Smart TV wifi won't turn on</span>" /&gt;
 ...
 """


soup = BeautifulSoup(html, 'html.parser')

for item in soup.findAll("span", {'class': 'html-attribute-value'})[2]:
    print(item)
© www.soinside.com 2019 - 2024. All rights reserved.