我正在尝试在这个网站上争取角落的机会:
https://www.pinnacle.com/en/soccer/england-premier-league/matchups/#leagueType:角球
未登录的几率会延迟,但不是问题,当我尝试使用隐藏的访客 api 进行抓取时,问题就出现了:总是收到 401 错误“未提供授权令牌”。 但如果我将 Api url 粘贴到 chrome 中,它确实可以完美地工作(在之前加载网页之后)。
这是我的代码:
import json
import requests
from piapy import PiaVpn
import pandas as pd
# Instantiate
vpn = PiaVpn()
# Get connection status
vpn.status() # equivalent to `piactl get connectionstate`
# Will connect to server, displaying status in stdout
vpn.connect(verbose=True, timeout=20)
base_url = "https://guest.api.arcadia.pinnacle.com/0.1/leagues/"
api_key = "CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R"
device_uuid = "feba3685-a63cfba3-8da41a19-075431fe"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Referer': 'https://www.pinnacle.com/',
'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
'Sec-Ch-Ua-Mobile': '?0',
'Sec-Ch-Ua-Platform': '"Windows"',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'X-Api-Key': api_key,
'X-Device-Uuid': device_uuid,
'Cache-Control': 'no-cache',
'Pragma': 'no-cache',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
}
league_numbers = [1980, 2421] # Add all league numbers here
matches_data = []
for league_number in league_numbers:
url = f"{base_url}{league_number}/matchups?brandId=0"
with requests.session() as s:
# load cookies:
s.get(url, headers=headers)
# get data:
data = s.get(url, headers=headers).json()
# print data to screen:
print(json.dumps(data, indent=4))
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
filtered_matches = [
match for match in data if "Corners" in match.get("league", {}).get("name", "")
]
for match in filtered_matches:
match_id = match.get("id")
parent_id = match.get("parent", {}).get("id")
home_team = match.get("parent", {}).get("participants", [])[0].get("name")
away_team = match.get("parent", {}).get("participants", [])[1].get("name")
start_time = match.get("parent", {}).get("startTime")
matches_data.append({
"Match_ID": match_id,
"Parent_ID": parent_id,
"Home_Team": home_team,
"Away_Team": away_team,
"Start_Time": start_time,
"League_Number": league_number # Add League number
})
else:
print(f"Error for league {league_number}:", response.status_code)
except requests.RequestException as e:
print(f"Request failed for league {league_number}:", e)
# Creating a DataFrame from extracted data
df = pd.DataFrame(matches_data)
# Writing DataFrame to an Excel file
file_path = 'matches_data.xlsx'
df.to_excel(file_path, index=False)
我尝试复制在开发人员工具中检查网络选项卡时发现的标头,但它不起作用...任何帮助都将非常感激!
谢谢大家,祝大家圣诞快乐
我只需使用
X-API-Key
标题就可以下载联赛数据。也许您向服务器提供了太多信息:
import requests
url = "https://guest.api.arcadia.pinnacle.com/0.1/leagues/{league}/matchups?brandId=0"
headers = {
"X-API-Key": "CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R",
}
leagues = [1980, 2421]
for l in leagues:
data = requests.get(url.format(league=l), headers=headers).json()
print(f"League={l} data len={len(data)}")
打印:
League=1980 data len=783
League=2421 data len=49