从网址列表下载文件

问题描述 投票:0回答:1

我想从

list
的网址下载这些网址后面的 .pdf。 这就是
list
:

[{'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=285',
  'created': '2017-11-30T12:00:00+01:00',
  'downloadUrl': None,
  'file': None,
  'fileName': '285.pdf',
  'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=285',
  'modified': '2017-12-06T10:34:46+01:00',
  'name': 'Vorlage-Sammeldokument',
  'reference': 'AA/2017/0001'},
 {'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=288',
  'created': '2017-11-30T12:00:00+01:00',
  'downloadUrl': None,
  'file': None,
  'fileName': '288.pdf',
  'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=288',
  'modified': '2017-11-30T17:02:39+01:00',
  'name': 'Vorlage-Sammeldokument',
  'reference': 'AA/2017/0002'},
 {'accessUrl': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/download.asp?dtyp=130&id=328',
  'created': '2017-11-30T12:00:00+01:00',
  'downloadUrl': None,
  'file': None,
  'fileName': '328.pdf',
  'id': 'https://www.itzstedt.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=328',
  'modified': '2017-11-30T18:23:44+01:00',
...
  'id': 'http://www.hagenbach.sitzung-online.de/bi/oparl/1.0/files.asp?dtyp=130&id=16438',
  'modified': '2017-01-04T11:22:42+01:00',
  'name': 'Vorlage-Sammeldokument',
  'reference': 'VO/2016/607'}]

网址位于“downloadUrl”中 我写了这段代码:

import os
from bs4 import BeautifulSoup
from urllib.parse import urljoin

for url in papers_docs:
    download_url = url.get("downloadUrl")
    if download_url:
        response = requests.get(download_url)
        file_name = url.get("fileName", os.path.basename(urlparse(download_url).path))
        with open(file_name, "wb") as f:
            f.write(response.content)
        print(f"Downloaded {file_name}")
    else:
            print(f"Failed to download {download_url}")

我的回应:

Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None
Failed to download None

我已经测试了数据是否到位:

for url in papers_docs:
    download_url = url.get("downloadUrl")
    if download_url:
        print(download_url)

链接已到位,但我无法找出错误。

file url python-requests
1个回答
0
投票

尝试使用

accessUrl
而不是
downloadUrl
,因为最后一个似乎总是
None
:

for url in papers_docs:
    download_url = url.get("accessUrl")
© www.soinside.com 2019 - 2024. All rights reserved.