如何让 Python 发送尽可能多的并发 HTTP 请求?

问题描述 投票:0回答:4

我正在尝试尽快发送 HTTPS 请求。我知道这必须是并发请求,因为我的目标是每秒 150 到 500 多个请求。我到处搜索过,但没有得到任何 Python 3.11+ 答案,也没有得到一个不会给我错误的答案。我试图避免使用 AIOHTTP,因为设置它的繁琐过程很痛苦,甚至不起作用。

输入应该是一个数组或 URL,输出应该是 html 字符串的数组。

python python-3.x http https concurrency
4个回答
3
投票

非常不幸的是,您无法正确设置 AIOHTTP,因为这是在 Python 中执行异步请求的最有效方法之一。

设置并不难:

import asyncio
import aiohttp
from time import perf_counter


def urls(n_reqs: int):
    for _ in range(n_reqs):
        yield "https://python.org"

async def get(session: aiohttp.ClientSession, url: str):
    async with session.get(url) as response:
        _ = await response.text()
             
async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[get(session, url) for url in urls(n_reqs)]
        )


if __name__ == "__main__":
    n_reqs = 10_000
    
    start = perf_counter()
    asyncio.run(main(n_reqs))
    end = perf_counter()
    
    print(f"{n_reqs / (end - start)} req/s")

您基本上需要创建一个

ClientSession
,然后重用它来发送获取请求。这些请求是与
asyncio.gather()
同时提出的。您也可以使用较新的
asyncio.TaskGroup
:

async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        async with asyncio.TaskGroup() as group:
            for url in urls(n_reqs):
                group.create_task(get(session, url))

在我已经使用了 7 年多的双核计算机上,这可以轻松实现每秒 500 多个请求。与其他答案所建议的相反,该解决方案不需要生成数千个线程,这是昂贵的。

您可以使用自定义连接器进一步提高速度,以便在单个会话中允许更多并发连接(默认为 100):

async def main(n_reqs: int):
    let connector = aiohttp.TCPConnector(limit=0)
    async with aiohttp.ClientSession(connector=connector) as session:
        ...


1
投票

这很有效,每秒收到大约 250 多个请求。 此解决方案适用于 Windows 10。您可能需要

pip install
来实现 并发请求

import time
import requests
import concurrent.futures

start = int(time.time()) # get time before the requests are sent

urls = [] # input URLs/IPs array
responses = [] # output content of each request as string in an array

# create an list of 5000 sites to test with
for y in range(5000):urls.append("https://example.com")

def send(url):responses.append(requests.get(url).content)

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:futures.append(executor.submit(send, url))
        
end = int(time.time()) # get time after stuff finishes
print(str(round(len(urls)/(end - start),0))+"/sec") # get average requests per second

输出:

286.0/sec

注意:如果您的代码需要极其依赖时间的内容,请将中间部分替换为:

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:
        futures.append(executor.submit(send, url))
    for future in concurrent.futures.as_completed(futures):
        responses.append(future.result())

这是 此网站在示例中显示的内容的修改版本

秘密酱汁是

max_workers=10000
。否则,平均约为 80/秒。不过,当将其设置为超过 1000 时,速度没有任何提升。


0
投票

希望这有帮助,这个问题问发送 10000 个 http 请求的最快方法是什么

我在10秒内观察到15000个请求,使用wireshark捕获本地主机并将数据包保存到CSV,只计算其中包含

GET
的数据包。

文件:a.py

from treq import get
from twisted.internet import reactor

def done(response):
   if response.code == 200:
       get("http://localhost:3000").addCallback(done)

get("http://localhost:3000").addCallback(done)

reactor.callLater(10, reactor.stop)
reactor.run()

像这样运行测试:

pip3 install treq
python3 a.py  # code from above

像这样设置测试网站,我的端口是3000

mkdir myapp
cd myapp
npm init
npm install express
node app.js

文件:app.js

const express = require('express')
const app = express()
const port = 3000

app.get('/', (req, res) => {
  res.send('Hello World!')
})

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
})

输出

grep GET wireshark.csv  | head
"5","0.000418","::1","::1","HTTP","139","GET / HTTP/1.1 "
"13","0.002334","::1","::1","HTTP","139","GET / HTTP/1.1 "
"17","0.003236","::1","::1","HTTP","139","GET / HTTP/1.1 "
"21","0.004018","::1","::1","HTTP","139","GET / HTTP/1.1 "
"25","0.004803","::1","::1","HTTP","139","GET / HTTP/1.1 "

grep GET wireshark.csv  | tail
"62145","9.994184","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62149","9.995102","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62153","9.995860","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62157","9.996616","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62161","9.997307","::1","::1","HTTP","139","GET / HTTP/1.1 "


0
投票

我创建了一个名为 unparallel 的包,适合您的用例。您可以按如下方式使用它:

import asyncio

from unparallel import up


async def main():
    urls = [
        "https://www.google.com/",
        "https://www.youtube.com/",
        "https://www.facebook.com/",
        "https://www.wikipedia.org/"
    ]

    # Do GET requests and return the content for all URLs
    responses = await up(urls, response_fn=lambda x: x.text)

    # Iterate over the responses and print the content
    for url, content in zip(urls[:10], responses):
        print(url, content[:100])


if __name__ == "__main__":
    asyncio.run(main())

这是运行上述命令后得到的输出:

❯ python docs/examples/multiple_websites.py 
Making async requests: 100%|█████████████████| 4/4 [00:00<00:00,  9.19it/s]
https://www.google.com/: '<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de-AT"><head><meta cont'
https://www.youtube.com/: '<!DOCTYPE html><html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="de-DE" da'
https://www.facebook.com/: '<!DOCTYPE html>\n<html lang="de" id="facebook" class="no_js">\n<head><meta charset="utf-8" /><meta nam'
https://www.wikipedia.org/: '<!DOCTYPE html>\n<html lang="en" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>Wikipedia</title'

您可以查看文档以了解有关如何参数化的更多信息

up()

© www.soinside.com 2019 - 2024. All rights reserved.