Python进度条和下载

问题描述 投票:0回答:17

我有一个 Python 脚本,它启动一个可下载文件的 URL。有没有办法让 Python 显示下载进度而不是启动浏览器?

python download progress-bar
17个回答
160
投票

我刚刚编写了一个超级简单(有点老套)的方法,用于从某个网站上抓取 PDF。请注意,它只能在基于 Unix 的系统(Linux、Mac os)上正常工作,因为 PowerShell 无法处理

"\r"
:

import sys
import requests

link = "http://indy/abcde1245"
file_name = "download.data"
with open(file_name, "wb") as f:
    print("Downloading %s" % file_name)
    response = requests.get(link, stream=True)
    total_length = response.headers.get('content-length')

    if total_length is None: # no content length header
        f.write(response.content)
    else:
        dl = 0
        total_length = int(total_length)
        for data in response.iter_content(chunk_size=4096):
            dl += len(data)
            f.write(data)
            done = int(50 * dl / total_length)
            sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)) )    
            sys.stdout.flush()

它使用 requests 库,因此您需要安装它。这会将类似以下内容输出到您的控制台中:

>正在下载download.data

>[==============                           ]

脚本中进度条的宽度为 52 个字符(2 个字符就是

[]
,即 50 个字符的进度)。每个
=
代表下载量的 2%。


84
投票

您可以使用“clint”包(由与“requests”相同的作者编写)为您的下载添加一个简单的进度条,如下所示:

import requests
from clint.textui import progress

r = requests.get(url, stream=True)
path = '/some/path/for/file.txt'
with open(path, 'wb') as f:
    total_length = int(r.headers.get('content-length'))
    for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1): 
        if chunk:
            f.write(chunk)
            f.flush()

这会给你一个动态输出,如下所示:

[################################] 5210/5210 - 00:00:01

它也应该适用于多个平台!您还可以将条形更改为点或使用 .dots 和 .mill 而不是 .bar 的微调器。

享受吧!


62
投票

Python 3 与 TQDM

这是TQDM 文档中建议的技术。

import urllib.request

from tqdm import tqdm


class DownloadProgressBar(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)


def download_url(url, output_path):
    with DownloadProgressBar(unit='B', unit_scale=True,
                             miniters=1, desc=url.split('/')[-1]) as t:
        urllib.request.urlretrieve(url, filename=output_path, reporthook=t.update_to)

33
投票

有一个带有 requeststqdm 的答案。

import requests
from tqdm import tqdm


def download(url: str, fname: str):
    resp = requests.get(url, stream=True)
    total = int(resp.headers.get('content-length', 0))
    # Can also replace 'file' with a io.BytesIO object
    with open(fname, 'wb') as file, tqdm(
        desc=fname,
        total=total,
        unit='iB',
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
        for data in resp.iter_content(chunk_size=1024):
            size = file.write(data)
            bar.update(size)

要点:https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51


11
投票

另一个不错的选择是

wget

import wget
wget.download('http://download.geonames.org/export/zip/US.zip')

输出将如下所示:

11% [........                                     ] 73728 / 633847

来源:https://medium.com/@petehouston/download-files-with-progress-in-python-96f14f6417a2


9
投票

您还可以使用单击。它有一个很好的进度条库:

import click

with click.progressbar(length=total_size, label='Downloading files') as bar:
    for file in files:
        download(file)
        bar.update(file.size)

8
投票

抱歉回复迟了;刚刚更新了

tqdm
文档:

https://github.com/tqdm/tqdm/#hooks-and-callbacks

使用

urllib.urlretrieve
和 OOP:

import urllib
from tqdm.auto import tqdm

class TqdmUpTo(tqdm):
    """Provides `update_to(n)` which uses `tqdm.update(delta_n)`."""
    def update_to(self, b=1, bsize=1, tsize=None):
        """
        b  : Blocks transferred so far
        bsize  : Size of each block
        tsize  : Total size
        """
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)  # will also set self.n = b * bsize

eg_link = "https://github.com/tqdm/tqdm/releases/download/v4.46.0/tqdm-4.46.0-py2.py3-none-any.whl"
eg_file = eg_link.split('/')[-1]
with TqdmUpTo(unit='B', unit_scale=True, unit_divisor=1024, miniters=1,
              desc=eg_file) as t:  # all optional kwargs
    urllib.urlretrieve(
        eg_link, filename=eg_file, reporthook=t.update_to, data=None)
    t.total = t.n

或使用

requests.get
和文件包装器:

import requests
from tqdm.auto import tqdm

eg_link = "https://github.com/tqdm/tqdm/releases/download/v4.46.0/tqdm-4.46.0-py2.py3-none-any.whl"
eg_file = eg_link.split('/')[-1]
response = requests.get(eg_link, stream=True)
with tqdm.wrapattr(open(eg_file, "wb"), "write", miniters=1,
                   total=int(response.headers.get('content-length', 0)),
                   desc=eg_file) as fout:
    for chunk in response.iter_content(chunk_size=4096):
        fout.write(chunk)

您当然可以混合搭配技巧。


4
投票

tqdm
包现在包含一个专门用于处理此类情况的函数:
wrapattr
。您只需包装对象的
read
(或
write
)属性,tqdm 就会处理其余的事情。这是一个简单的下载功能,将所有内容与
requests
:

def download(url, filename):
    import functools
    import pathlib
    import shutil
    import requests
    import tqdm
    
    r = requests.get(url, stream=True, allow_redirects=True)
    if r.status_code != 200:
        r.raise_for_status()  # Will only raise for 4xx codes, so...
        raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
    file_size = int(r.headers.get('Content-Length', 0))

    path = pathlib.Path(filename).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)

    desc = "(Unknown total file size)" if file_size == 0 else ""
    r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
    with tqdm.tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
        with path.open("wb") as f:
            shutil.copyfileobj(r_raw, f)

    return path

4
投票

#定义进度条功能

def print_progressbar(total, current, barsize=60):
    progress = int(current*barsize/total)
    completed = str(int(current*100/total)) + '%'
    print('[', chr(9608)*progress, ' ', completed, '.'*(barsize-progress), '] ', str(i)+'/'+str(total), sep='', end='\r', flush=True)

#示例代码

total = 6000
barsize = 60
print_frequency = max(min(total//barsize, 100), 1)
print("Start Task..", flush=True)
for i in range(1, total+1):
  if i%print_frequency == 0 or i == 1:
    print_progressbar(total, i, barsize)
print("\nFinished", flush=True)

# 进度条快照 :

以下各行仅供说明之用。在命令提示符中,您将看到单个进度条显示增量进度。

[ 0%............................................................] 1/6000

[██████████ 16%..................................................] 1000/6000

[████████████████████ 33%........................................] 2000/6000

[██████████████████████████████ 50%..............................] 3000/6000

[████████████████████████████████████████ 66%....................] 4000/6000

[██████████████████████████████████████████████████ 83%..........] 5000/6000

[████████████████████████████████████████████████████████████ 100%] 6000/6000

0
投票

只是@rich-jones 答案的一些改进

 import re
 import request
 from clint.textui import progress

 def get_filename(cd):
    """
    Get filename from content-disposition
    """
    if not cd:
        return None
    fname = re.findall('filename=(.+)', cd)
    if len(fname) == 0:
        return None
    return fname[0].replace('"', "")

def stream_download_file(url, output, chunk_size=1024, session=None, verbose=False):
    
    if session:
        file = session.get(url, stream=True)
    else:
        file = requests.get(url, stream=True)
        
    file_name = get_filename(file.headers.get('content-disposition'))
    filepath = "{}/{}".format(output, file_name)
    
    if verbose: 
        print ("Downloading {}".format(file_name))
        
    with open(filepath, 'wb') as f:
        total_length = int(file.headers.get('content-length'))
        for chunk in progress.bar(file.iter_content(chunk_size=chunk_size), expected_size=(total_length/chunk_size) + 1): 
            if chunk:
                f.write(chunk)
                f.flush()
    if verbose: 
        print ("Finished")

0
投票

我在

tqdm
的基础上提出了一个看起来更好一点的解决方案。我的实现是基于@Endophage的答案。

效果:

# import the download_file definition from the next cell first.
>>> download_file(url, 'some_data.dat')
Downloading some_data.dat.
  7%|█▎                  | 195.31MB/2.82GB:  [00:04<01:02, 49.61MB/s]

实施:

import time
import math
import requests
from tqdm import tqdm


def download_file(url, filename, update_interval=500, chunk_size=4096):
    def memory2str(mem):
        sizes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
        power = int(math.log(mem, 1024))
        size = sizes[power]
        for _ in range(power):
            mem /= 1024
        if power > 0:
            return f'{mem:.2f}{size}'
        else:
            return f'{mem}{size}'
    with open(filename, 'wb') as f:
        response = requests.get(url, stream=True)
        total_length = response.headers.get('content-length')
        if total_length is None:
            f.write(response.content)
        else:
            print(f'Downloading {filename}.', flush=True)
            downloaded, total_length = 0, int(total_length)
            total_size = memory2str(total_length)
            bar_format = '{percentage:3.0f}%|{bar:20}| {desc} [{elapsed}<{remaining}' \
                         '{postfix}]'
            if update_interval * chunk_size * 100 >= total_length:
                update_interval = 1
            with tqdm(total=total_length, bar_format=bar_format) as bar:
                counter = 0
                now_time, now_size = time.time(), downloaded
                for data in response.iter_content(chunk_size=chunk_size):
                    f.write(data)
                    downloaded += len(data)
                    counter += 1
                    bar.update(len(data))
                    if counter % update_interval == 0:
                        ellapsed = time.time() - now_time
                        runtime_downloaded = downloaded - now_size
                        now_time, now_size = time.time(), downloaded

                        cur_size = memory2str(downloaded)
                        speed_size = memory2str(runtime_downloaded / ellapsed)
                        bar.set_description(f'{cur_size}/{total_size}')
                        bar.set_postfix_str(f'{speed_size}/s')

                        counter = 0

0
投票

使用

wget
tqdm
python 库的简单解决方案,显示以兆字节为单位的进度和剩余时间:

MB:  37%|███▋      | 2044.8/5588.7 [02:57<04:30, 13.11it/s]
  • 安装库

    pip3 install wget tqdm

  • 导入库

    import wget
    from tqdm import tqdm
    
  • tqdm 的包装类

    class ProgressBar:
    
      def __init__(self):
          self.progress_bar = None
    
      def __call__(self, current_bytes, total_bytes, width):
          current_mb = round(current_bytes / 1024 ** 2, 1)
          total_mb = round(total_bytes / 1024 ** 2, 1)
          if self.progress_bar is None:
              self.progress_bar = tqdm(total=total_mb, desc="MB")
          delta_mb = current_mb - self.progress_bar.n
          self.progress_bar.update(delta_mb)
    
  • 如何使用

    wget.download(url, dst_filepath, ProgressBar())
    

0
投票

这是 George Hotz 的“Goat Progress bar”实现。

r = requests.get(url, stream=True)
progress_bar = tqdm(total=int(r.headers.get('content-length', 0)), unit='B', unit_scale=True, desc=url)
dat = b''.join(x for x in r.iter_content(chunk_size=16384) if progress_bar.update(len(x)) or True)

抄送:https://github.com/geohot/tinygrad/commit/7118602c976d264d97af3c1c8b97d72077616d07


0
投票

您可以轻松使用

dlbar
模块:

python3 -m pip install dlbar

只需

import
并调用
download
方法:

from dlbar import DownloadBar

download_bar = DownloadBar()

download_bar.download(
    url='https://url',
    dest='/a/b/c/downloaded_file.suffix',
    title='Downloading downloaded_file.suffix'
)

输出:

Downloading downloaded_file.suffix
43% █████████████████████----------------------------- 197.777 MB/450.327 MB

您还可以自定义下载栏。 请参阅此处了解更多信息。


0
投票

我修改了许多很棒的建议以适合我的情况。

我需要下载一个大的 .txt 文件(>2.5 GB)。文本文件中的每一行都包含一个唯一的段落。因此我需要从文件中检索段落列表。

请注意,以下代码不是 100% 防弹。这是因为这些块可能不完全位于段落的结尾/开头,导致段落被分成两部分。然而,就我而言,这不是问题。增加

chunk_size
将减少“损坏”段落的数量。

import requests
from tqdm import tqdm

     def DownloadFile(url):      
        req = requests.get(url, stream=True)
        total_length = int(req.headers.get('content-length'))
        chunk_size = 4194304 # 4Mb
        steps = total_length / chunk_size
        data = []
        for chunk in tqdm(req.iter_content(chunk_size=chunk_size), total=steps):
            text = chunk.decode("utf-8", "ignore") 
            for line in text.split("\n"):
                data.append(line.rstrip())
        return data 

0
投票

我缺少一个没有依赖项的解决方案,所以这里是:

from urllib.request import urlretrieve

if __name__ == '__main__':
    urlretrieve(url, filename, printProgress)
    print(end='\r')


def printProgress(blocknum, bs, size):
    percent = (blocknum * bs) / size
    done = "#" * int(40 * percent)
    print(f'\r[{done:<40}] {percent:.1%}', end='')

urlretrieve
之后的打印将清除进度条。如果您愿意,可以使用不同的进度条宽度 (40)


-1
投票

您可以在此处流式传输下载 -> 流式传输下载

您还可以流式上传

最重要的流式传输请求已完成,除非您尝试访问响应内容。 只需 2 行

for line in r.iter_lines():    
    if line:
        print(line)

流请求

© www.soinside.com 2019 - 2024. All rights reserved.