我正在尝试利用asyncio
并行化几个长期运行的Web请求。由于我要从requests
库进行迁移,因此由于类似的API,我想使用httpx
库。我的环境是安装了所有必需软件包的Python 3.7.7 Anaconda发行版(Windows 10)。
但是,尽管能够将httpx
用于同步Web请求(或用于串行执行逐个运行的异步请求),但是尽管一次很容易,但我仍然无法一次成功运行一个以上的异步请求使用aiohttp
库执行此操作。
以下示例代码在aiohttp
中正常运行:(请注意,我在Jupyter中运行,因此我已经有一个事件循环,因此缺少asyncio.run()
。
import aiohttp
import asyncio
import time
import httpx
async def call_url(session):
url = "https://services.cancerimagingarchive.net/services/v3/TCIA/query/getCollectionValues"
response = await session.request(method='GET', url=url)
#response.raise_for_status()
return response
for i in range(1,5):
start = time.time() # start time for timing event
async with aiohttp.ClientSession() as session: #use aiohttp
#async with httpx.AsyncClient as session: #use httpx
await asyncio.gather(*[call_url(session) for x in range(i)])
print(f'{i} call(s) in {time.time() - start} seconds')
这将产生预期的响应时间配置文件:
1 call(s) in 7.9129478931427 seconds
2 call(s) in 8.876991510391235 seconds
3 call(s) in 9.730034589767456 seconds
4 call(s) in 10.630006313323975 seconds
但是,如果我取消注释async with httpx.AsyncClient as session: #use httpx
并注释掉async with aiohttp.ClientSession() as session: #use aiohttp
(将httpx
换成aiohttp
,则会出现以下错误:
AttributeError Traceback (most recent call last)
<ipython-input-108-25244245165a> in async-def-wrapper()
17 await asyncio.gather(*[call_url(session) for x in range(i)])
18 print(f'{i} call(s) in {time.time() - start} seconds')
AttributeError: __aexit__
[在我的在线研究中,我只能找到Simon Hawe撰写的这篇中型文章,其中显示了如何使用httpx
进行并行请求。参见https://medium.com/swlh/how-to-boost-your-python-apps-using-httpx-and-asynchronous-calls-9cfe6f63d6ad
但是,示例异步代码甚至没有使用异步会话对象,因此我只是有点怀疑而已。该代码无法在Python 3.7.7环境或Jupyter中执行。 (代码在这里:https://gist.githubusercontent.com/Shawe82/a218066975f4b325e026337806f8c781/raw/3cb492e971c13e76a07d1a1e77b48de94aa7229c/concurrent_download.py)
导致此错误:
Traceback (most recent call last):
File ".\async_http_test.py", line 24, in <module>
asyncio.run(download_all_photos('100_photos'))
File "C:\Users\stborg\AppData\Local\Continuum\anaconda3\envs\fastai2\lib\asyncio\runners.py", line 43, in run
return loop.run_until_complete(main)
File "C:\Users\stborg\AppData\Local\Continuum\anaconda3\envs\fastai2\lib\asyncio\base_events.py", line 587, in run_until_complete
return future.result()
File ".\async_http_test.py", line 16, in download_all_photos
resp = await httpx.get("https://jsonplaceholder.typicode.com/photos")
TypeError: object Response can't be used in 'await' expression
我显然做错了,因为httpx
是为异步构建的。我只是不确定是什么!
[在进一步尝试编写此问题时,我发现httpx
和aiohttp
处理上下文管理器的方式存在细微的差异。
在介绍问题的代码中,以下代码与aiohttp
一起使用:
async with aiohttp.ClientSession() as session: #use aiohttp
await asyncio.gather(*[call_url(session) for x in range(i)])
此代码将ClientSession上下文作为参数传递给call_url
方法。我假设asyncio.gather()
完成后,然后按照正常的with
语句清理资源。
但是,与httpx
相同的方法失败,如上所述。但是,只需完全避免使用with
语句,然后手动关闭AsyncClient
,就可以轻松解决此问题。
换句话说,替换
#async with httpx.AsyncClient as session: #use httpx await asyncio.gather(*[call_url(session) for x in range(i)])
与
session = httpx.AsyncClient() #use httpx await asyncio.gather(*[call_url(session) for x in range(i)]) await session.aclose()
解决问题。
这里是完整的工作代码:
import aiohttp
import asyncio
import time
import httpx
async def call_url(session):
url = "https://services.cancerimagingarchive.net/services/v3/TCIA/query/getCollectionValues"
response = await session.request(method='GET', url=url)
return response
for i in range(1,5):
start = time.time() # start time for timing event
#async with aiohttp.ClientSession() as session: #use aiohttp
session = httpx.AsyncClient() #use httpx
await asyncio.gather(*[call_url(session) for x in range(i)])
await session.aclose()
print(f'{i} call(s) in {time.time() - start} seconds')