我有多个传输数据的 REST API 端点(URL)。我想知道在一个/多个进程中读取所有这些内容的最佳方法是什么。
目前我只从一个网址读取数据,执行以下操作:
s = requests.Session()
resp = s.get(url, headers=headers, stream=True)
for line in resp.iter_lines():
if line:
print(line)
我想对更多网址做同样的事情,我想知道这里最好的方法是什么。
以下是如何使用
concurrent.futures.ThreadPoolExecutor
读取多个 URL 的示例。但这只是一种方法,你可以使用multiprocessing
、asyncio
/aiohttp
等
from concurrent.futures import ThreadPoolExecutor
import requests
def get_from_api(tpl):
session, url = tpl
resp = session.get(url, stream=True)
# just for example:
count_lines = 0
for line in resp.iter_lines():
count_lines += 1
return url, count_lines
def main():
api_urls = [
"https://google.com",
"https://yahoo.com",
"https://facebook.com",
"https://instagram.com",
# ...etc.
]
with ThreadPoolExecutor(max_workers=2) as pool, requests.session() as session:
for url, count_lines in pool.map(
get_from_api, ((session, url) for url in api_urls)
):
print(url, count_lines)
if __name__ == "__main__":
main()
打印:
https://google.com 17
https://yahoo.com 648
https://facebook.com 26
https://instagram.com 50