网页抓取 - 使用类从HTML获取元素

Question

我有以下HTML

<div class="ui_columns is-gapless is-mobile">
    <div class="ui_column is-4 providerLogoOuter">
        <span class="providerLogoInner" title=""><imgsrc="https://static.tacdn.com/img2/branding/hotels/Hoteiscom_384x164.png" class="providerImg" alt="Hoteis.com">

但是，我需要从alt=只得到“Hoteis.com”

我试图用BeautifulSoup得到它，但我怎么能得到这个元素？

name_player = soup.find_all(class_='providerLogoInner')[0]

返回没有元素

Answer 1

是畸形的HTML或一个错字？

html="""
<div class="ui_columns is-gapless is-mobile">
<div class="ui_column is-4 providerLogoOuter">
<span class="providerLogoInner" title=""><imgsrc="https://static.tacdn.com/img2/branding/hotels/Hoteiscom_384x164.png" class="providerImg" alt="Hoteis.com">
"""
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,'html5lib')
print(soup.find(class_='providerImg')['alt'])

输出：

Hoteis.com

Answer 2

你可以做：

from bs4 import BeautifulSoup


raw = '''
<div class="ui_columns is-gapless is-mobile">
    <div class="ui_column is-4 providerLogoOuter">
        <span class="providerLogoInner" title=""><imgsrc="https://static.tacdn.com/img2/branding/hotels/Hoteiscom_384x164.png" class="providerImg" alt="Hoteis.com">
'''

soup = BeautifulSoup(raw,'html5lib')

hotel_lnk = soup.find('span',{'class':'providerLogoInner'}).next['alt']

print(hotel_lnk)

#'Hoteis.com'

网页抓取 - 使用类从HTML获取元素

问题描述投票：0回答：2

2个回答

最新问题

网页抓取 - 使用类从HTML获取元素

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2