我正在尝试构建一个刮板,以获取史诗游戏商店的免费游戏的URL
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0",}
result = requests.get("https://www.epicgames.com/store/en-US/free-games?sessionInvalidated=true",
headers=headers)
soup = BeautifulSoup(result.content, 'lxml')
urls = []
links = []
urls = soup.find('div', {'class': 'CardGrid-group_c5363b6a'}).find_all("a")
return urls
JavaScript
添加元素,但requests
/ BeautifuSoup
无法运行JavaScript
但是通常JavaScript会从URL读取数据,您可以在DevTools
/ Firefox
的Chrome
中找到该数据(标签:Network
,过滤器:XHR
),并且您可以使用它来读取JSON格式的数据-因此您不需要BeautifulSoup
import requests
url = 'https://store-site-backend-static.ak.epicgames.com/freeGamesPromotions?locale=en-US&country=PL&allowCountries=PL'
r = requests.get(url)
data = r.json()
#print(r.text)
for item in data['data']['Catalog']['searchStore']['elements']:
print(item['title'])
offers = item['promotions']['promotionalOffers']
for offer in offers:
print(offer['promotionalOffers'][0]['startDate'])
print(offer['promotionalOffers'][0]['endDate'])
结果
Mystery Game Grand Theft Auto V 2020-05-14T15:00:00.000Z 2020-05-21T15:00:00.000Z
您应该在data
中进行挖掘以获取其他详细信息。BTW:也许您将不得不为
country
和allowCountries
使用不同的值