Vivino.com的数据抓取

问题描述 投票:0回答:1

长时间潜伏在这里,这个社区一直在帮助我很多,谢谢大家。

所以我试图从vivino.com收集数据,而DataFrame却是空的,我可以看到我的汤正在收集网站信息,但看不到我的错误在哪里。

我的代码:


    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

    r = requests.get("https://www.vivino.com/explore?e=eJzLLbI1VMvNzLM1UMtNrLA1NTBQS660DQhRS7Z1DQ1SKwDKpqfZliUWZaaWJOao5SfZFhRlJqeq5dsmFierlZdExwJVJFcWA-mCEgC1YxlZ", headers=headers)#, proxies=proxies)
    content = r.content
    soup = BeautifulSoup(content, "html.parser")

而且我需要酿酒师,葡萄酒的名称和等级,这是我如何尝试的:

    for d in soup.findAll('div', attrs={'class':'explorerCard__titleColumn--28kWX'}):

        Winery = d.find_all("a", attrs={"class":"VintageTitle_winery--2YoIr"})
        Wine = d.find_all("a", attrs={"class":"VintageTitle_wine--U7t9G"})
        Rating = d.find_all("div", attrs={"class":"VivinoRatingWide_averageValue--1zL_5"})
        num_Reviews = d.find_all("div", attrs={"class":"VivinoRatingWide__basedOn--s6y0t"})
        Stars = d.find_all("div", attrs={"aria-label":"rating__rating--ZZb_x rating__vivino--1vGCy"})

        alll=[]

        if Winery is not None:
            #print(n[0]["alt"])
            alll.append(Winery.text)

        else:
            alll.append("unknown-winery")

        if Wine is not None:
            #print(wine.text)
            alll.append(wine.text)
        else:
            alll.append("0")

        if Rating is not None:
            #print(rating.text)
            alll.append(rating.text)

        else:
            alll.append("0")
...

然后将数据放入DataFrame中:

for i in range(1, no_pages+1):
    results.append(get_data())
flatten = lambda l: [item for sublist in l for item in sublist]
df = pd.DataFrame(flatten(results),columns=['Winery','Wine','Rating','num_review', 'Stars'])
df.to_csv('redwines.csv', index=False, encoding='utf-8')

谢谢大家

python pandas web-scraping beautifulsoup data-science
1个回答
0
投票

[您的数据可能在某些javascript代码后面;幸运的是,数据可以作为json文件使用。我检查了Network标签并找到了它们。

import requests

url = "https://www.vivino.com/api/explore/explore?country_code=AU&country_codes[]=pt&currency_code=AUD&grape_filter=varietal&min_rating=1&order_by=price&order=asc&page=1&price_range_max=80&price_range_min=20&wine_type_ids[]=1"

r = requests.get(url)

#ur data:
r.json()

还有其他json文件;您可以检查浏览器的“网络”标签以访问它们。

© www.soinside.com 2019 - 2024. All rights reserved.