使用请求下载 pdf 不起作用

问题描述 投票:0回答:1

我编写了一段代码,使用请求从链接下载 pdf 文件

import requests

url = "https://disclosure.bursamalaysia.com/FileAccess/apbursaweb/download?id=231746&name=EA_DS_ATTACHMENTS"

response = requests.get(url)

with open("EA_DS_ATTACHMENTS.pdf", "wb") as f:
    f.write(response.content)

print("PDF downloaded successfully!")

当然,这是行不通的。相反,它会下载无法读取的 PDF。我怀疑它是因为它不是一个正确的 PDF 下载链接,但我又不太确定,因为我对此不熟悉。

python pdf python-requests
1个回答
0
投票

使用请求时返回 403 响应。根据标头中的用户代理,它看起来像是阻塞的。您可以使用自定义标头来模仿浏览器的用户代理来获取 PDF 文档。

import requests

url = (
    'https://disclosure.bursamalaysia.com/FileAccess/apbursaweb/download?'
    'id=231746&name=EA_DS_ATTACHMENTS'
)

res_bad = requests.get(url)
print(res_bad, res_bad.request.headers)

# prints:
# <Response [403]> {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 
# 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}

# this is the FireFox user agent
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) '
                         'Gecko/20100101 Firefox/124.0'}

res_good = requests.get(url, headers=headers)
with open("EA_DS_ATTACHMENTS.pdf", "wb") as f:
    f.write(res.content)
© www.soinside.com 2019 - 2024. All rights reserved.