从 Api 抓取时出现问题,总是收到 401 消息

问题描述 投票:0回答:1

我正在尝试在这个网站上争取角落的机会:

https://www.pinnacle.com/en/soccer/england-premier-league/matchups/#leagueType:角球

未登录的几率会延迟,但不是问题,当我尝试使用隐藏的访客 api 进行抓取时,问题就出现了:总是收到 401 错误“未提供授权令牌”。 但如果我将 Api url 粘贴到 chrome 中,它确实可以完美地工作(在之前加载网页之后)。

这是我的代码:

import json
import requests
from piapy import PiaVpn
import pandas as pd

# Instantiate
vpn = PiaVpn()

# Get connection status
vpn.status() # equivalent to `piactl get connectionstate`

# Will connect to server, displaying status in stdout
vpn.connect(verbose=True, timeout=20)

base_url = "https://guest.api.arcadia.pinnacle.com/0.1/leagues/"
api_key = "CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R"
device_uuid = "feba3685-a63cfba3-8da41a19-075431fe"



headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Referer': 'https://www.pinnacle.com/',
    'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': '"Windows"',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'X-Api-Key': api_key,
    'X-Device-Uuid': device_uuid,
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
}


    
league_numbers = [1980, 2421]  # Add all league numbers here
matches_data = []

for league_number in league_numbers:
    url = f"{base_url}{league_number}/matchups?brandId=0"

    with requests.session() as s:

        # load cookies:
        s.get(url, headers=headers)

        # get data:
        data = s.get(url, headers=headers).json()

        # print data to screen:
        print(json.dumps(data, indent=4))
        

    try:
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            data = response.json()
            filtered_matches = [
                match for match in data if "Corners" in match.get("league", {}).get("name", "")
            ]

            for match in filtered_matches:
                match_id = match.get("id")
                parent_id = match.get("parent", {}).get("id")
                home_team = match.get("parent", {}).get("participants", [])[0].get("name")
                away_team = match.get("parent", {}).get("participants", [])[1].get("name")
                start_time = match.get("parent", {}).get("startTime")

                matches_data.append({
                    "Match_ID": match_id,
                    "Parent_ID": parent_id,
                    "Home_Team": home_team,
                    "Away_Team": away_team,
                    "Start_Time": start_time,
                    "League_Number": league_number  # Add League number
                })
        else:
            print(f"Error for league {league_number}:", response.status_code)
    except requests.RequestException as e:
        print(f"Request failed for league {league_number}:", e)

# Creating a DataFrame from extracted data
df = pd.DataFrame(matches_data)

# Writing DataFrame to an Excel file
file_path = 'matches_data.xlsx'
df.to_excel(file_path, index=False)

我尝试复制在开发人员工具中检查网络选项卡时发现的标头,但它不起作用...任何帮助都将非常感激!

谢谢大家,祝大家圣诞快乐

python api web-scraping http-headers
1个回答
0
投票

我只需使用

X-API-Key
标题就可以下载联赛数据。也许您向服务器提供了太多信息:

import requests

url = "https://guest.api.arcadia.pinnacle.com/0.1/leagues/{league}/matchups?brandId=0"

headers = {
    "X-API-Key": "CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R",
}

leagues = [1980, 2421]

for l in leagues:
    data = requests.get(url.format(league=l), headers=headers).json()
    print(f"League={l} data len={len(data)}")

打印:

League=1980 data len=783
League=2421 data len=49
© www.soinside.com 2019 - 2024. All rights reserved.