尝试搜索网站时发布请求错误 400

问题描述 投票:0回答:1

我正在尝试在这里搜索电影标题:https://classindportal.mj.gov.br/consulta-filmes并抓取结果页面。我知道这涉及到使用我的搜索词向网站发送特定请求的中间步骤,但我目前无法执行此操作。

使用 Google DevTools 时,网络选项卡显示以下信息

Request URL: https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list
Request Method: POST
Status Code: 200 OK
Referrer Policy: strict-origin-when-cross-origin

并且请求负载包含一个键

tituloBr
,其值等于搜索词(例如,如果我在搜索栏中输入“shrek”并按 Enter 键,则为
{'tituloBr': 'shrek'}
)。

我相信搜索涉及向请求 URL 发送一个 post 请求(如上所示),发送数据

{'tituloBr': 'shrek'}
,所以我使用了 requests 库,如下所示:

payload = {'tituloBr': 'shrek'}
r = requests.post('https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list', data = payload)

但这会给出错误代码 400,其中

r.reason
显示
'Bad Request'

我认为我发送的 URL 或数据没有任何问题,所以我不确定问题是什么。

python python-requests
1个回答
0
投票

我检查了页面,似乎您需要提供

token
- 可以通过向以下地址发送
POST
请求来获取:

https://sso.mj.gov.br/auth/realms/PRD/protocol/openid-connect/token

因此,获取令牌,然后使用令牌向 API 发送另一个请求来搜索您想要的电影

import requests


SEARCH_TERM = "shrek"

token_url = "https://sso.mj.gov.br/auth/realms/PRD/protocol/openid-connect/token"
movies_url = (
    "https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list"
)


headers = {
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9,he;q=0.8",
    "Authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJMRVNSQzZ4UGtUdnlzNUdvUHpwaHNmeTJTSmMta0ZZcjFKM2VBNS1uOExnIn0.eyJleHAiOjE3MDY1NDIwNzMsImlhdCI6MTcwNjU0MTc3MywianRpIjoiYzNkY2FhOTctMTFhNi00N2Y0LThlMjUtNzRlYzcxMTIzNGNkIiwiaXNzIjoiaHR0cHM6Ly9zc28ubWouZ292LmJyL2F1dGgvcmVhbG1zL1BSRCIsImF1ZCI6WyJjbGFzc2luZC1iYWNrZW5kIiwiYWNjb3VudCJdLCJzdWIiOiIxODNmYWI5MC1hM2Y1LTQ1MWMtODQwMi1hYzAwMWVhYmM1ZTMiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJjbGFzc2luZC1jb25zdWx0YXB1YmxpY2EtZnJvbnRlbmQiLCJhY3IiOiIxIiwiYWxsb3dlZC1vcmlnaW5zIjpbImh0dHBzOi8vY2xhc3NpbmRwb3J0YWwubWouZ292LmJyIl0sInJlYWxtX2FjY2VzcyI6eyJyb2xlcyI6WyJ1bWFfYXV0aG9yaXphdGlvbiIsImRlZmF1bHQtcm9sZXMtcHJkIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsiYWNjb3VudCI6eyJyb2xlcyI6WyJtYW5hZ2UtYWNjb3VudCIsIm1hbmFnZS1hY2NvdW50LWxpbmtzIiwidmlldy1wcm9maWxlIl19fSwic2NvcGUiOiJjbGFzc2luZC1iYWNrZW5kIiwiY2xpZW50SWQiOiJjbGFzc2luZC1jb25zdWx0YXB1YmxpY2EtZnJvbnRlbmQiLCJjbGllbnRIb3N0IjoiMTAuMjUwLjEyOC4xMTMiLCJjbGllbnRBZGRyZXNzIjoiMTAuMjUwLjEyOC4xMTMifQ.RbreSBJYQ4aPZYEQmSHWo5ZkQaEEy4M9UqWkOHg2wRAoQsxHCzo3dj3CRilyHocnt-K6toV1MUVF_pk1rg2IYeOcrq5NJFaErKGl4Iy69dG_PBwU1RHP3da5-paLDg6DPZZTu2UR1FmShuvlzaSXFNe5JSDoWP1RMjpCSP5bBpXHz0M-KvbZqPykYky-pIpxCpwEIlsL15hpTFqxrghpvWcpiLfjC-YRALynXxPZFiDzqpNq9nsQwLFCXjC6lAeZmP3GQcDZMIDEBgeSx7slomM2E360teqK2WXmZHmJxRwIWP1snJDetlxbDlDHuFxGVLyLsR8kJMbKTPnZEeDUyw",
    "Connection": "keep-alive",
    "Origin": "https://classindportal.mj.gov.br",
    "Referer": "https://classindportal.mj.gov.br/consulta-filmes",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
    "sec-ch-ua": '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"macOS"',
}


json_data = {
    "currentPage": 0,
    "pageSize": 10,
    "sortItem": None,
    "totalResults": None,
    "itens": None,
    "tituloBr": f"{SEARCH_TERM}",
    "tituloOr": "",
    "requerente": "",
    "produtor": "",
    "editora": "",
    "idModulo": 1,
}


token_data = {
    "client_id": "classind-consultapublica-frontend",
    "client_secret": "4PmaBa8bBeVow40SKFNb7qNHzAxuLoqz",
    "grant_type": "client_credentials",
    "scope": "classind-backend",
}


with requests.Session() as session:
    token = session.post(token_url, data=token_data).json()["access_token"]
    headers["Authorization"] = f"Bearer {token}"
    response = session.post(movies_url, json=json_data, headers=headers)
    print(response.json())

如果您愿意,您甚至可以将数据转换为 Pandas 数据框:

import pandas as pd
# ...

with requests.Session() as session:
    token = session.post(token_url, data=token_data).json()["access_token"]
    headers["Authorization"] = f"Bearer {token}"
    response = session.post(movies_url, json=json_data, headers=headers)
    data = response.json()["itens"]
    df = pd.DataFrame(data)
    print(df)

哪个打印:

       id       tituloBrasil  ... classificacaoAtribuida classificacaoPretendida
0  164346              SHREK  ...                  Livre                    None
1  164345            SHREK 2  ...                  Livre                    None
2  164344  SHREK PARA SEMPRE  ...                  Livre                    None
3  164343     SHREK TERCEIRO  ...                  Livre                    None
4  146845            SHREK 2  ...                  Livre                    None
5  146844     SHREK TERCEIRO  ...                  Livre                    None
6  135770              SHREK  ...                  Livre                    None
7  135769            SHREK 2  ...                  Livre                    None
8  135768  SHREK PARA SEMPRE  ...                  Livre                    None
9  135767     SHREK TERCEIRO  ...                  Livre                    None

[10 rows x 8 columns]
© www.soinside.com 2019 - 2024. All rights reserved.