python中执行列表中多个URL并评估其主体的最快方法?

问题描述 投票:0回答:1

我有一个100,000个以上网址的文本文件。我要执行所有这些命令,并评估它们对某些文本的响应。我当前的代码可以执行此操作,但需要几个小时才能完成。

这是我当前的代码

text = "stackoverflow"
urls = open("urls.txt").read().splitlines()

def fetch_url(url):
    try:
        response = urlopen(url, timeout=2)
        return url, response.read(), None
    except Exception as e:
        return url, None, e

try:
    results = ThreadPool(300).imap_unordered(fetch_url, urls)
except:
    pass

for url, html, error in results:
    if error is None:
        if text.encode() in html:
            print("Found in " + url)
    else:
        print("error %r: %s" % (url, error))
python python-3.x python-requests python-asyncio python-multithreading
1个回答
0
投票

我不认为可以,因为这取决于您的互联网速度。但是结果列表的大小迅速增加,这不是很好的内存管理因此我认为您应该这样做。


urls = open("urls.txt").read().splitlines()

def fetch_url(url):
    try:
        response = urlopen(url, timeout=2)
        return url, response.read(), None
    except Exception as e:
        return url, None, e

def check_url(url):
    text = "stackoverflow"
    url, html, error = fetch_url(url)
    if error is None:
        if text.encode() in html:
            print("Found in " + url)
    else:
        print("error %r: %s" % (url, error))

try:
    ThreadPool(300).imap_unordered(check_url, urls)
except:
    pass

这样,您就不会以很大的列表结尾。

© www.soinside.com 2019 - 2024. All rights reserved.