使用 Python requests 库无法下载

问题描述 投票:0回答:2

我尝试使用以下 URL 下载 PDF,但看不到任何内容。当我尝试不同的 URL 时,效果很好。有人可以解释一下问题可能是什么吗?和这个网站有关系吗? 这是代码:

import requests

pdf_url="https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
pdf_title="test"

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0"}

response = requests.get(pdf_url, headers, stream = True)
if response.status_code==200:
    content = next(response.iter_content(10))

    with open(f"{pdf_title}.pdf", "wb") as fd:
        fd.write(response.content)
python python-requests request
2个回答
0
投票

试试这个,

不要只读取响应内容的第一块,而是尝试读取整个内容以查看是否有任何有意义的数据。您可以通过删除 next(response.iter_content(10)) 行并直接写入响应来完成此操作。文件内容。

import requests

pdf_url = "https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
pdf_title = "test.pdf"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0"
}

response = requests.get(pdf_url, headers=headers, stream=True)
if response.status_code == 200:
    with open(pdf_title, "wb") as fd:
        fd.write(response.content)
        print("PDF downloaded successfully!")
else:
    print("Failed to download PDF. Status code:", response.status_code)

0
投票

您的请求被拒绝。您没有收到您期望的文档。您将收到一个显示错误的 HTML 页面。

import requests

url = "https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
target = url.split("/")[-1]

with requests.get(url, stream=True) as response:
    response.raise_for_status()
    with open(target, "wb") as output:
        for chunk in response.iter_content(4096):
            output.write(chunk)

with open(target) as result:
    print(result.read())

输出:

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. Please consult with your administrator.<br><br>Your support ID is: <11940751230596137624><br><br><a href='javascript:history.back();'>[Go Back]</body></html>
© www.soinside.com 2019 - 2024. All rights reserved.