我尝试使用以下 URL 下载 PDF,但看不到任何内容。当我尝试不同的 URL 时,效果很好。有人可以解释一下问题可能是什么吗?和这个网站有关系吗? 这是代码:
import requests
pdf_url="https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
pdf_title="test"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0"}
response = requests.get(pdf_url, headers, stream = True)
if response.status_code==200:
content = next(response.iter_content(10))
with open(f"{pdf_title}.pdf", "wb") as fd:
fd.write(response.content)
试试这个,
不要只读取响应内容的第一块,而是尝试读取整个内容以查看是否有任何有意义的数据。您可以通过删除 next(response.iter_content(10)) 行并直接写入响应来完成此操作。文件内容。
import requests
pdf_url = "https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
pdf_title = "test.pdf"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0"
}
response = requests.get(pdf_url, headers=headers, stream=True)
if response.status_code == 200:
with open(pdf_title, "wb") as fd:
fd.write(response.content)
print("PDF downloaded successfully!")
else:
print("Failed to download PDF. Status code:", response.status_code)
您的请求被拒绝。您没有收到您期望的文档。您将收到一个显示错误的 HTML 页面。
import requests
url = "https://www.npci.org.in/PDF/nach/circular/2015-16/Circular_No_126.pdf"
target = url.split("/")[-1]
with requests.get(url, stream=True) as response:
response.raise_for_status()
with open(target, "wb") as output:
for chunk in response.iter_content(4096):
output.write(chunk)
with open(target) as result:
print(result.read())
输出:
<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. Please consult with your administrator.<br><br>Your support ID is: <11940751230596137624><br><br><a href='javascript:history.back();'>[Go Back]</body></html>