通过带有漂亮汤的aria标签获得评分

问题描述 投票:0回答:1

我有一个汤对象,如:

r = requests.get('https://www.yelp.com/biz/panera-bread-markham')
soup = BeautifulSoup(r.text, 'html.parser')

并且我正在尝试从以下代码中找到评分,

rating_list = soup.find_all('span', {"class":"lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"})
rating_list

输出是这样的列表,

[<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="3 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--large-3__373c0__2oM4P border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="4 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-4__373c0__3acau border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="5 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-5__373c0__ySHIl border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="3 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-3__373c0__1DXMK border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><p class="lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--mid__373c0__3G312 text-align--left__373c0__2pnx_ text-size--small__373c0__3SGMi"><span aria-hidden="true" class="lemon--span__373c0__3997G icon__373c0__ehCWV icon--18-check-in" style="width:18px;height:18px;fill:#0077bc"><svg class="icon_svg" height="18" viewbox="0 0 18 18" width="18" xmlns="http://www.w3.org/2000/svg"><path d="M18 9l-2.136-1.84.932-2.66-2.772-.525-.524-2.77-2.66.93L8.997 0 7.163 2.136 4.5 1.206l-.525 2.77-2.77.524.932 2.66L0 9l2.137 1.84-.932 2.66 2.77.525.526 2.77 2.664-.932L8.998 18l1.84-2.137 2.662.932.524-2.77 2.772-.524-.932-2.66L18 9zm-9.85 3.23L5.324 9.4l1.13-1.13 1.698 1.696 3.396-3.395 1.13 1.134-4.525 4.525z"></path></svg></span> <!-- -->1 check-in</p></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="1 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-1__373c0__14nrQ border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
 <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><p class="lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--mid__373c0__3G312 text-align--left__373c0__2pnx_ text-size--small__373c0__3SGMi"><span aria-hidden="true" class="lemon--span__373c0__3997G icon__373c0__ehCWV icon--18-check-in" style="width:18px;height:18px;fill:#0077bc"><svg class="icon_svg" height="18" viewbox="0 0 18 18" width="18" xmlns="http://www.w3.org/2000/svg"><path d="M18 9l-2.136-1.84.932-2.66-2.772-.525-.524-2.77-2.66.93L8.997 0 7.163 2.136 4.5 1.206l-.525 2.77-2.77.524.932 2.66L0 9l2.137 1.84-.932 2.66 2.77.525.526 2.77 2.664-.932L8.998 18l1.84-2.137 2.662.932.524-2.77 2.772-.524-.932-2.66L18 9zm-9.85 3.23L5.324 9.4l1.13-1.13 1.698 1.696 3.396-3.395 1.13 1.134-4.525 4.525z"></path></svg></span> <!-- -->1 check-in</p></span>,
         <span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="1 star .....
    .
    .
    .

关于从<div aria-label="3 star rating"获得评分的任何建议?

web-scraping beautifulsoup text-mining
1个回答
2
投票

实际上,有很多方法,通过从JSON标签加载script或找到分配的div。但是我认为以下方法很清楚:)

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.findAll("meta", itemprop="author")
    for tar in target:
        print(tar['content'], tar.findNext("meta")['content'])


main("https://www.yelp.com/biz/panera-bread-markham")

输出:

Shia L. 4.0
Ryan L. 5.0
Chi K. 3.0
Joan T. 1.0
Nicky D S. 4.0
Matthew K. 3.0
Michelle W. 1.0
Jennifer C. 4.0
Niral P. 3.0
Shajitha R. 1.0
Veronica C. 3.0
Tanveer K. 1.0
Joey J. 2.0
Broadwaygirl M. 1.0
Sheena Y. 3.0
Wendy B. 4.0
Jacqueline L. 2.0
Mi S. 3.0
Sharon M. 2.0
Eduni C. 1.0
© www.soinside.com 2019 - 2024. All rights reserved.