事实上,我正在创建一个代理检查程序,但问题是它需要大量的时间来检查,因为有很多的代理。
def proxy():
lives = []
allproxy = []
def fetch_proxy():
raw_proxy = []
res = requests.get(proxy_api)
raw_proxy = res.text.splitlines()
return raw_proxy
allproxy = fetch_proxy()
for proxy in allproxy:
try:
proxyDictChk = {
"https" : "https://"+proxy,
"http" : "http://"+proxy,
}
res = requests.get("http://httpbin.org/ip",proxies=proxyDictChk,timeout=3)
print("Proxy is Working")
lives.append(proxy)
except Exception as e:
print("Proxy Dead")
return lives
print(proxy())
我很好奇,如何在这里使用多线程使其快速发展。
PS. 先谢谢你
python文档中提供了一个很好的例子。https:/docs.python.org3libraryconcurrent.futures.html。
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(check_proxy, url, 60): url for url in allproxy}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
is_valid = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%s page is %s' % (url, is_valid))
所以你只需要定义函数check_proxy。
def check_proxy( proxy ):
try:
proxyDictChk = {
"https" : "https://"+proxy,
"http" : "http://"+proxy,
}
res = requests.get("http://httpbin.org/ip",proxies=proxyDictChk,timeout=3)
print("Proxy is Working")
return True
except Exception as e:
print("Proxies Dead!")
return False
本质上,使用一个执行器并提交一个做你想要的函数。然后在函数完成时,使用未来来获取函数的结果。
另外,因为这样可以让异常冒出来,所以你不必在函数中处理它。
def check_proxy( proxy ):
proxyDictChk = { "https" : "https://"+proxy,
"http" : "http://"+proxy,
}
res = requests.get("http://httpbin.org/ip",proxies=proxyDictChk,timeout=3)
return True
现在可以在未来状态下处理异常了。你可以把返回类型改成更有意义的东西。