我使用线程池执行器在大量端点上运行一个函数。 我不明白的是它随着时间的推移越来越慢 - 例如最初它在监视“刷新”间隔内处理 5-6000 个 url,但这个数字几乎呈线性下降。 哪里变慢了(主机都是可比较的api端点,具有相同的响应时间)。
显然仍然比 for 循环快得多,只是对它的机制很好奇。
import requests
from concurrent.futures import ThreadPoolExecutor
hosts = [] ## list of 1mil+ endpoints to update.
successful_hosts = []
def fn_to_run(host):
r = requests.get(host.url+'/endpoint')
if r.status_code == 200:
successful_hosts.append(host)
pool = ThreadPoolExecutor(1500) ##also tried very different numbers from [20:10000]
futures = [pool.submit(fn_to_run,host) for host in hosts]
t_start = time.time()
while True:
completed_futures = 0
for future in futures:
if future.done():
completed_futures += 1
t_now = time.time()
print(f"""{completed_futures} hosts checked {round(completed_futures/len(futures)*100,1)}% done in {round((t_now - t_start)/60,1)} seconds""")
if completed_futures/len(futures) >= 0.999:
print('finishing and shutting down pool')
break
time.sleep(15) ## status refresh interval
我试过了