我需要一种有效的方法来测试一些免费的在线 HTTP 代理并确定哪些代理可以访问特定网站;
由于代理测试需要大量的等待时间,我选择重新设计异步测试的代码。然后我探索了 httpx 和 aiohttp 包。然而,我遇到了意想不到的行为,让我怀疑我当前的代码是否最适合我的目的。
下面是我使用的三种方法的代码输出:
如您所见,存在多个错误,并且完成每个请求所需的时间差异很大。 有趣的是, requests 方法 返回了四个链接的 HTTP 200 状态,而 httpx 方法 返回了 5 个链接,而 aiohttp 方法 没有返回任何内容,考虑到它们应该执行同样的任务。这引起了人们对我如何实现它们的怀疑。
此外,在httpx方法中,一个代理花费了莫名其妙的长时间,即使我将超时设置为60秒。花了 13,480.64 秒(我应该提一下,在这次测试中,当我发现它花费的时间太长时,我将我的电脑置于睡眠模式。当我稍后返回时,我发现该进程并没有停止并且仍在运行。)
1) --> 185.XXX.XX.XX:80 --> ProxyError (4.96s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (2.50s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (20.92s)
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (0.61s)
5) --> 31.XX.XX.XX:50687 --> ConnectionError (7.88s)
6) --> 177.XX.XXX.XXX:80 --> ProxyError (21.07s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (4.96s)
8) --> 146.XX.XXX.XXX:12334 --> ProxyError (21.05s)
9) --> 67.XX.XXX.XXX:33081 --> ProxyError (3.03s)
10) --> 37.XXX.XX.XX:80 --> ReadTimeout (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (16.09s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (22.11s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (12.96s)
1) --> 185.XXX.XX.XX:80 --> RemoteProtocolError (24.83s)
9) --> 67.XX.XXX.XXX:33081 --> ConnectError (6.02s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (22.48s)
6) --> 177.XX.XXX.XXX:80 --> HTTP (200) (26.96s)
5) --> 31.XX.XX.XX:50687 --> ConnectError (34.50s)
8) --> 146.XX.XXX.XXX:12334 --> ConnectError (27.01s)
10) --> 37.XXX.XX.XX:80 --> ReadError (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.
1) --> 185.XXX.XX.XX:80 --> ClientProxyConnectionError (1.30s)
2) --> 38.XX.XXX.XXX:443 --> ClientProxyConnectionError (0.67s)
3) --> 162.XXX.XX.XXX:80 --> ClientProxyConnectionError (0.77s)
4) --> 18.XXX.XXX.XXX:8080 --> ClientProxyConnectionError (0.83s)
5) --> 31.XX.XX.XX:50687 --> ClientProxyConnectionError (0.85s)
6) --> 177.XX.XXX.XXX:80 --> ClientProxyConnectionError (0.91s)
7) --> 8.XXX.XXX.X:4153 --> ClientProxyConnectionError (0.94s)
8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError (1.03s)
9) --> 67.XX.XXX.XXX:33081 --> ClientProxyConnectionError (1.05s)
10) --> 37.XXX.XX.XX:80 --> ClientProxyConnectionError (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.
这是我使用的代码:
我首先从 this GitHub 存储库下载代理:
import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp
TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en,ar;q=0.9,fr;q=0.8",
"Accept-Encoding": "gzip, deflate",
"dnt": "1",
"referer": "https://www.google.com/",
"sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "cross-site",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
"Connection": "keep-alive",
}
def get_proxies() -> list[str]:
proxies: list[str] = []
if os.path.exists(PROXIES_PATH):
with open(file=PROXIES_PATH, mode="r") as file:
proxies = file.read().splitlines()
file.close()
else:
response = requests.request(method="GET", url=PROXIES_URL)
if response.status_code == 200:
proxies = response.text
with open(file=PROXIES_PATH, mode="w") as file:
file.write(proxies)
file.close()
proxies = proxies.split("\n")
return proxies
下面是我用来顺序测试这些代理的方法:
def sequential_test(proxies_list: list[str]):
if proxies_list:
with requests.Session() as session:
session.headers = HEADERS
for i, proxy in enumerate(proxies_list, 1):
session.proxies = {"http": f"http://{proxy}"}
try:
color = "\033[91m"
start = time.perf_counter()
response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # requests.RequestException
status = type(exception).__name__
print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")
以下是我用来测试代理是否可以与所需网站一起工作的代码。我分别使用了httpx和aiohttp:
async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
async with httpx.AsyncClient(
mounts=proxy_mounts,
timeout=TIMEOUT,
headers=HEADERS,
follow_redirects=True
) as session:
try:
color = "\033[91m"
start = time.perf_counter()
response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # httpx.HTTPError
status = type(exception).__name__
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"
async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
async with aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=TIMEOUT),
headers=HEADERS
) as session:
try:
color = "\033[91m"
start = time.perf_counter()
response = await session.get(url=f"http://{domain}", proxy=f"http://{proxy}")
status = f"HTTP ({response.status})"
if response.status == 200:
color = "\033[92m"
except Exception as exception: # aiohttp.ClientError
status = type(exception).__name__
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
await asyncio.sleep(0.25)
下面是代码的其余部分。您可以通过将其复制到您的环境中来直接运行它(只需确保您已安装所需的软件包):
async def test_proxies(proxies_list: list[str], func):
if proxies_list:
await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])
def main():
proxies = random.sample(get_proxies(), 10) # get_proxies()[:10]
start = time.perf_counter()
sequential_test(proxies)
print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_httpx))
print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_aiohttp))
print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')
if __name__ == "__main__":
main()
要从 aiohttp 代码诊断错误,重要的是打印完整的异常详细信息,而不仅仅是其名称。
print(exception)
在此代码中,打印了有关错误原因的详细信息。