Web抓取Yelp,如何检索每个单独评分的值? [重复]

问题描述 投票:0回答:1

[从事网络抓取项目以建立我的知识(初学者)。这段代码很乱,但目前我可以打印每条评论的评分。如何从bs4对象(即列表中的4.0,5,0)中提取评级,然后取其平均值?

Output:
[<meta content="4.0" itemprop="ratingValue"/>, <meta content="5.0" itemprop="ratingValue"/>, ... ]
import mechanize
from bs4 import BeautifulSoup

def searchYelp():

    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

    response = br.open('https://www.yelp.com')
    br.select_form(nr=0)
    br.form['find_desc'] = 'Del Taco'
    br.form['find_loc'] = 'New York City'
    br.submit()

    link_list = []
    for link in br.links():
        if link.url.startswith('/biz/'):
            link_list.append(link.url)
            break

    big_list_of_ratings = []
    yelpPage = br.open(link_list[0])
    soup = BeautifulSoup(yelpPage.read(), 'html.parser')

    for review in soup.find_all('meta'):
        if review.get('itemprop') == 'ratingValue':
            big_list_of_ratings.append(review)

    print(big_list_of_ratings)


searchYelp()

python beautifulsoup mechanize
1个回答
1
投票

代替此

for review in soup.find_all('meta'):
        if review.get('itemprop') == 'ratingValue':
            big_list_of_ratings.append(review)

添加这样的属性review['content']

  for review in soup.find_all('meta'):
            if review.get('itemprop') == 'ratingValue':
                big_list_of_ratings.append(review['content'])

或者我建议使用css选择器。

for review in soup.select('meta[itemprop="ratingValue"][content]'):
        big_list_of_ratings.append(review['content'])
© www.soinside.com 2019 - 2024. All rights reserved.