我想从
list
的网址下载这些网址后面的 .pdf。
这就是list
:
[{'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=285',
'created': '2017-11-30T12:00:00+01:00',
'downloadUrl': None,
'file': None,
'fileName': '285.pdf',
'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=285',
'modified': '2017-12-06T10:34:46+01:00',
'name': 'Vorlage-Sammeldokument',
'reference': 'AA/2017/0001'},
{'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=288',
'created': '2017-11-30T12:00:00+01:00',
'downloadUrl': None,
'file': None,
'fileName': '288.pdf',
'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=288',
'modified': '2017-11-30T17:02:39+01:00',
'name': 'Vorlage-Sammeldokument',
'reference': 'AA/2017/0002'},
{'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=328',
'created': '2017-11-30T12:00:00+01:00',
'downloadUrl': None,
'file': None,
'fileName': '328.pdf',
'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=328',
'modified': '2017-11-30T18:23:44+01:00',
...
'id': 'http://www.hagenbach.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=16438',
'modified': '2017-01-04T11:22:42+01:00',
'name': 'Vorlage-Sammeldokument',
'reference': 'VO/2016/607'}]
网址位于“downloadUrl”中 我写了这段代码:
import os
from bs4 import BeautifulSoup
from urllib.parse import urljoin
for url in papers_docs:
download_url = url.get("downloadUrl")
if download_url:
response = requests.get(download_url)
file_name = url.get("fileName", os.path.basename(urlparse(download_url).path))
with open(file_name, "wb") as f:
f.write(response.content)
print(f"Downloaded {file_name}")
else:
print(f"Failed to download {download_url}")
我的回应:
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
我已经测试了数据是否到位:
for url in papers_docs:
download_url = url.get("downloadUrl")
if download_url:
print(download_url)
链接已到位,但我无法找出错误。
尝试使用
accessUrl
而不是 downloadUrl
,因为最后一个似乎总是 None
:
for url in papers_docs:
download_url = url.get("accessUrl")