如何使用并发期货计算每个 url 的行数?

问题描述 投票:0回答:1

我有多个网址,如下所示。

每个 url 都有如下数据。

line1
line2
line3
line4
line5

我想计算每个 url 使用并发期货的行数。

这是我的。

import concurrent.futures
import urllib.request

URLS = ['https://website.com/files/123.txt',
        'https://website.com/files/456.txt',
        'https://website.com/files/789.txt']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
python-3.x concurrent.futures
1个回答
0
投票

如果你想统计每个 URL 数据中的行数,你可以创建一个函数

count_lines
来返回行数而不是数据本身:

def count_lines(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        data = conn.read()
        num_lines = len(data.splitlines())
        return num_lines

然后他们像这样修改您的其余代码:

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future_to_url = {executor.submit(count_lines, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            num_lines = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r has %d lines' % (url, num_lines))
© www.soinside.com 2019 - 2024. All rights reserved.