Scraper返回空结果

问题描述 投票:0回答:1

我正在尝试构建一个刮板,以获取史诗游戏商店的免费游戏的URL

enter image description here

headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0",}
result = requests.get("https://www.epicgames.com/store/en-US/free-games?sessionInvalidated=true", 
headers=headers)
soup = BeautifulSoup(result.content, 'lxml')
urls = []
links = []
urls = soup.find('div', {'class': 'CardGrid-group_c5363b6a'}).find_all("a")
return urls

enter image description here但是它一直返回null,我看不出怎么了?

python screen-scraping
1个回答
1
投票
此页面使用JavaScript添加元素,但requests / BeautifuSoup无法运行JavaScript

但是通常JavaScript会从URL读取数据,您可以在DevTools / FirefoxChrome中找到该数据(标签:Network,过滤器:XHR),并且您可以使用它来读取JSON格式的数据-因此您不需要BeautifulSoup

import requests url = 'https://store-site-backend-static.ak.epicgames.com/freeGamesPromotions?locale=en-US&country=PL&allowCountries=PL' r = requests.get(url) data = r.json() #print(r.text) for item in data['data']['Catalog']['searchStore']['elements']: print(item['title']) offers = item['promotions']['promotionalOffers'] for offer in offers: print(offer['promotionalOffers'][0]['startDate']) print(offer['promotionalOffers'][0]['endDate'])

结果

Mystery Game Grand Theft Auto V 2020-05-14T15:00:00.000Z 2020-05-21T15:00:00.000Z

您应该在data中进行挖掘以获取其他详细信息。

BTW:也许您将不得不为countryallowCountries使用不同的值

© www.soinside.com 2019 - 2024. All rights reserved.