如何高效测试一些HTTP代理访问特定域?

问题描述 投票:0回答:1

我需要一种有效的方法来测试一些免费的在线 HTTP 代理并确定哪些代理可以访问特定网站;

由于代理测试需要大量的等待时间,我选择重新设计异步测试的代码。然后我探索了 httpx 和 aiohttp 包。然而,我遇到了意想不到的行为,让我怀疑我当前的代码是否最适合我的目的。

下面是我使用的三种方法的代码输出:

  • 使用requests包进行同步测试,
  • 另外两个用于异步测试。

如您所见,存在多个错误,并且完成每个请求所需的时间差异很大。 有趣的是, requests 方法 返回了四个链接的 HTTP 200 状态,而 httpx 方法 返回了 5 个链接,而 aiohttp 方法 没有返回任何内容,考虑到它们应该执行同样的任务。这引起了人们对我如何实现它们的怀疑。

此外,在httpx方法中,一个代理花费了莫名其妙的长时间,即使我将超时设置为60秒。花了 13,480.64 秒(我应该提一下,在这次测试中,当我发现它花费的时间太长时,我将我的电脑置于睡眠模式。当我稍后返回时,我发现该进程并没有停止并且仍在运行。)

任何人都可以告诉我我在这里做错了什么以及如何改进吗?

 1) --> 185.XXX.XX.XX:80     --> ProxyError      (4.96s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)      (2.50s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)      (20.92s)
 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)      (0.61s)
 5) --> 31.XX.XX.XX:50687    --> ConnectionError (7.88s)
 6) --> 177.XX.XXX.XXX:80    --> ProxyError      (21.07s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)      (4.96s)
 8) --> 146.XX.XXX.XXX:12334 --> ProxyError      (21.05s)
 9) --> 67.XX.XXX.XXX:33081  --> ProxyError      (3.03s)
10) --> 37.XXX.XX.XX:80      --> ReadTimeout     (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.


 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)          (16.09s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)          (22.11s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)          (12.96s)
 1) --> 185.XXX.XX.XX:80     --> RemoteProtocolError (24.83s)
 9) --> 67.XX.XXX.XXX:33081  --> ConnectError        (6.02s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)          (22.48s)
 6) --> 177.XX.XXX.XXX:80    --> HTTP (200)          (26.96s)
 5) --> 31.XX.XX.XX:50687    --> ConnectError        (34.50s)
 8) --> 146.XX.XXX.XXX:12334 --> ConnectError        (27.01s)
10) --> 37.XXX.XX.XX:80      --> ReadError           (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.


 1) --> 185.XXX.XX.XX:80     --> ClientProxyConnectionError  (1.30s)
 2) --> 38.XX.XXX.XXX:443    --> ClientProxyConnectionError  (0.67s)
 3) --> 162.XXX.XX.XXX:80    --> ClientProxyConnectionError  (0.77s)
 4) --> 18.XXX.XXX.XXX:8080  --> ClientProxyConnectionError  (0.83s)
 5) --> 31.XX.XX.XX:50687    --> ClientProxyConnectionError  (0.85s)
 6) --> 177.XX.XXX.XXX:80    --> ClientProxyConnectionError  (0.91s)
 7) --> 8.XXX.XXX.X:4153     --> ClientProxyConnectionError  (0.94s)
 8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError  (1.03s)
 9) --> 67.XX.XXX.XXX:33081  --> ClientProxyConnectionError  (1.05s)
10) --> 37.XXX.XX.XX:80      --> ClientProxyConnectionError  (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.

这是我使用的代码:

我首先从 this GitHub 存储库下载代理:

import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp

TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "en,ar;q=0.9,fr;q=0.8",
    "Accept-Encoding": "gzip, deflate",
    "dnt": "1",
    "referer": "https://www.google.com/",
    "sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"Windows"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "cross-site",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
    "Connection": "keep-alive",
}

def get_proxies() -> list[str]:
    proxies: list[str] = []
    if os.path.exists(PROXIES_PATH):
        with open(file=PROXIES_PATH, mode="r") as file:
            proxies = file.read().splitlines()
            file.close()
    else:
        response = requests.request(method="GET", url=PROXIES_URL)
        if response.status_code == 200:
            proxies = response.text
            with open(file=PROXIES_PATH, mode="w") as file:
                file.write(proxies)
                file.close()
            proxies = proxies.split("\n")
    return proxies

下面是我用来顺序测试这些代理的方法:

def sequential_test(proxies_list: list[str]):
    if proxies_list:
        with requests.Session() as session:
            session.headers = HEADERS
            for i, proxy in enumerate(proxies_list, 1):
                session.proxies = {"http": f"http://{proxy}"}
                try:
                    color = "\033[91m"
                    start = time.perf_counter()
                    response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
                    status = f"HTTP ({response.status_code})"
                    if response.status_code == 200:
                        color = "\033[92m"
                except Exception as exception:  # requests.RequestException
                    status = type(exception).__name__
                print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")

以下是我用来测试代理是否可以与所需网站一起工作的代码。我分别使用了httpx和aiohttp:

async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
    async with httpx.AsyncClient(
        mounts=proxy_mounts,
        timeout=TIMEOUT,
        headers=HEADERS,
        follow_redirects=True
    ) as session:
        try:
            color = "\033[91m"
            start = time.perf_counter()
            response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
            status = f"HTTP ({response.status_code})"
            if response.status_code == 200:
                color = "\033[92m"
        except Exception as exception:  # httpx.HTTPError
            status = type(exception).__name__
        print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"


async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    async with aiohttp.ClientSession(
        timeout=aiohttp.ClientTimeout(total=TIMEOUT),
        headers=HEADERS
    ) as session:
        try:
            color = "\033[91m"
            start = time.perf_counter()
            response = await session.get(url=f"http://{domain}", proxy=f"http://{proxy}")
            status = f"HTTP ({response.status})"
            if response.status == 200:
                color = "\033[92m"
        except Exception as exception:  # aiohttp.ClientError
            status = type(exception).__name__
    print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
    await asyncio.sleep(0.25)

下面是代码的其余部分。您可以通过将其复制到您的环境中来直接运行它(只需确保您已安装所需的软件包):

async def test_proxies(proxies_list: list[str], func):
    if proxies_list:
        await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])


def main():
    proxies = random.sample(get_proxies(), 10)  # get_proxies()[:10]

    start = time.perf_counter()
    sequential_test(proxies)
    print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_httpx))
    print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_aiohttp))
    print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')


if __name__ == "__main__":
    main()

python asynchronous proxy aiohttp httpx
1个回答
0
投票

要从 aiohttp 代码诊断错误,重要的是打印完整的异常详细信息,而不仅仅是其名称。

print(exception)

在此代码中,打印了有关错误原因的详细信息。

© www.soinside.com 2019 - 2024. All rights reserved.