我有多个网址,如下所示。
每个 url 都有如下数据。
line1
line2
line3
line4
line5
我想计算每个 url 使用并发期货的行数。
这是我的。
import concurrent.futures
import urllib.request
URLS = ['https://website.com/files/123.txt',
'https://website.com/files/456.txt',
'https://website.com/files/789.txt']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
如果你想统计每个 URL 数据中的行数,你可以创建一个函数
count_lines
来返回行数而不是数据本身:
def count_lines(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
data = conn.read()
num_lines = len(data.splitlines())
return num_lines
然后他们像这样修改您的其余代码:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = {executor.submit(count_lines, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
num_lines = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r has %d lines' % (url, num_lines))