我正在尝试尽快发送 HTTPS 请求。我知道这必须是并发请求,因为我的目标是每秒 150 到 500 多个请求。我到处搜索过,但没有得到任何 Python 3.11+ 答案,也没有得到一个不会给我错误的答案。我试图避免使用 AIOHTTP,因为设置它的繁琐过程很痛苦,甚至不起作用。
输入应该是一个数组或 URL,输出应该是 html 字符串的数组。
非常不幸的是,您无法正确设置 AIOHTTP,因为这是在 Python 中执行异步请求的最有效方法之一。
设置并不难:
import asyncio
import aiohttp
from time import perf_counter
def urls(n_reqs: int):
for _ in range(n_reqs):
yield "https://python.org"
async def get(session: aiohttp.ClientSession, url: str):
async with session.get(url) as response:
_ = await response.text()
async def main(n_reqs: int):
async with aiohttp.ClientSession() as session:
await asyncio.gather(
*[get(session, url) for url in urls(n_reqs)]
)
if __name__ == "__main__":
n_reqs = 10_000
start = perf_counter()
asyncio.run(main(n_reqs))
end = perf_counter()
print(f"{n_reqs / (end - start)} req/s")
您基本上需要创建一个
ClientSession
,然后重用它来发送获取请求。这些请求是与 asyncio.gather()
同时提出的。您也可以使用较新的asyncio.TaskGroup
:
async def main(n_reqs: int):
async with aiohttp.ClientSession() as session:
async with asyncio.TaskGroup() as group:
for url in urls(n_reqs):
group.create_task(get(session, url))
在我已经使用了 7 年多的双核计算机上,这可以轻松实现每秒 500 多个请求。与其他答案所建议的相反,该解决方案不需要生成数千个线程,这是昂贵的。
您可以使用自定义连接器进一步提高速度,以便在单个会话中允许更多并发连接(默认为 100):
async def main(n_reqs: int):
let connector = aiohttp.TCPConnector(limit=0)
async with aiohttp.ClientSession(connector=connector) as session:
...
这很有效,每秒收到大约 250 多个请求。 此解决方案适用于 Windows 10。您可能需要
pip install
来实现 并发 和 请求。
import time
import requests
import concurrent.futures
start = int(time.time()) # get time before the requests are sent
urls = [] # input URLs/IPs array
responses = [] # output content of each request as string in an array
# create an list of 5000 sites to test with
for y in range(5000):urls.append("https://example.com")
def send(url):responses.append(requests.get(url).content)
with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
futures = []
for url in urls:futures.append(executor.submit(send, url))
end = int(time.time()) # get time after stuff finishes
print(str(round(len(urls)/(end - start),0))+"/sec") # get average requests per second
输出:
286.0/sec
注意:如果您的代码需要极其依赖时间的内容,请将中间部分替换为:
with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
futures = []
for url in urls:
futures.append(executor.submit(send, url))
for future in concurrent.futures.as_completed(futures):
responses.append(future.result())
秘密酱汁是
max_workers=10000
。否则,平均约为 80/秒。不过,当将其设置为超过 1000 时,速度没有任何提升。
希望这有帮助,这个问题问发送 10000 个 http 请求的最快方法是什么
我在10秒内观察到15000个请求,使用wireshark捕获本地主机并将数据包保存到CSV,只计算其中包含
GET
的数据包。
文件:a.py
from treq import get
from twisted.internet import reactor
def done(response):
if response.code == 200:
get("http://localhost:3000").addCallback(done)
get("http://localhost:3000").addCallback(done)
reactor.callLater(10, reactor.stop)
reactor.run()
像这样运行测试:
pip3 install treq
python3 a.py # code from above
像这样设置测试网站,我的端口是3000
mkdir myapp
cd myapp
npm init
npm install express
node app.js
文件:app.js
const express = require('express')
const app = express()
const port = 3000
app.get('/', (req, res) => {
res.send('Hello World!')
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
输出
grep GET wireshark.csv | head
"5","0.000418","::1","::1","HTTP","139","GET / HTTP/1.1 "
"13","0.002334","::1","::1","HTTP","139","GET / HTTP/1.1 "
"17","0.003236","::1","::1","HTTP","139","GET / HTTP/1.1 "
"21","0.004018","::1","::1","HTTP","139","GET / HTTP/1.1 "
"25","0.004803","::1","::1","HTTP","139","GET / HTTP/1.1 "
grep GET wireshark.csv | tail
"62145","9.994184","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62149","9.995102","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62153","9.995860","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62157","9.996616","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62161","9.997307","::1","::1","HTTP","139","GET / HTTP/1.1 "
我创建了一个名为 unparallel 的包,适合您的用例。您可以按如下方式使用它:
import asyncio
from unparallel import up
async def main():
urls = [
"https://www.google.com/",
"https://www.youtube.com/",
"https://www.facebook.com/",
"https://www.wikipedia.org/"
]
# Do GET requests and return the content for all URLs
responses = await up(urls, response_fn=lambda x: x.text)
# Iterate over the responses and print the content
for url, content in zip(urls[:10], responses):
print(url, content[:100])
if __name__ == "__main__":
asyncio.run(main())
这是运行上述命令后得到的输出:
❯ python docs/examples/multiple_websites.py
Making async requests: 100%|█████████████████| 4/4 [00:00<00:00, 9.19it/s]
https://www.google.com/: '<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de-AT"><head><meta cont'
https://www.youtube.com/: '<!DOCTYPE html><html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="de-DE" da'
https://www.facebook.com/: '<!DOCTYPE html>\n<html lang="de" id="facebook" class="no_js">\n<head><meta charset="utf-8" /><meta nam'
https://www.wikipedia.org/: '<!DOCTYPE html>\n<html lang="en" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>Wikipedia</title'
您可以查看文档以了解有关如何参数化的更多信息
up()
。