Python 请求无法通过 url 下载 pdf 文件

问题描述 投票:0回答:2

我通常使用 requests 库来下载具有特定 url 的 pdf;但这次不行了,我想可能和网站有关。我在网上发现,添加标头在某些情况下可能有效,但在尝试了其中几个之后,结果是相同的:文件已下载,但无法打开,因为它似乎已损坏。

您有其他方法可以成功地从该网站下载 pdf 文件吗?这是我最近尝试的片段:

import requests

url = 'https://www.adgm.com/documents/operating-in-adgm/ongoing-obligation/enforcement/alpha-development-middle-east-ltd-penalty-notice-redacted.pdf?la=en&hash=5EA2DA7D1492D105375580EEF2FB088F'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
response = requests.get(url, stream = True, headers = headers)

with open('sample.pdf', 'wb') as f:
    f.write(response.content)

谢谢,

任何允许正确下载 pdf 文件的替代建议将受到高度赞赏。

python pdf python-requests urllib python-pdfreader
2个回答
1
投票

该特定站点需要 Accept-Language 和 User-Agent 标头。要下载该文档,您可以执行以下操作:

import requests

PDF = "alpha-development-middle-east-ltd-penalty-notice-redacted.pdf"

URL = f"https://www.adgm.com/documents/operating-in-adgm/ongoing-obligation/enforcement/{PDF}"

PARAMS = {
    "la": "en",
    "hash": "5EA2DA7D1492D105375580EEF2FB088F"
}

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_3_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,pt;q=0.7"
}

CHUNK = 32 * 1024

with requests.get(URL, headers=HEADERS, params=PARAMS, stream=True) as response:
    response.raise_for_status()
    with open(PDF, "wb") as output:
        for data in response.iter_content(CHUNK):
            output.write(data)

0
投票

如果您在使用 Python 中的 requests 库通过 URL 下载 PDF 文件时遇到问题,可能有多种原因。以下是如何使用请求下载 PDF 的简单示例:

import requests

url = "https://example.com/path/to/your/file.pdf"

response = requests.get(url)

if response.status_code == 200:
    with open("downloaded_file.pdf", "wb") as f:
        f.write(response.content)
    print("File downloaded successfully.")
else:
    print(f"Failed to download file. Status code: {response.status_code}")

确保将 url 变量替换为您要下载的 PDF 文件的实际 URL。另外,检查响应状态代码以确保请求成功(状态代码 200)。

© www.soinside.com 2019 - 2024. All rights reserved.