Python:Socket.timeout 未由 except 处理

问题描述 投票:0回答:1

有时我可以有效地处理socket.timeout,尽管有时我会收到套接字超时错误并且我的脚本突然停止...我的异常处理中是否缺少某些内容?怎么会直接穿过它?

在以下任意一段代码中随机发生:

第一个片段:

for _ in range(max_retries):
    try:
        req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
        response = urlopen(req,timeout=5)
        break
    except error.URLError as err: 
        print("URL that generated the error code: ", url)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", url)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", url)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", url)
        print("Error description: Socket error.")

if response.getheader('Content-Type').startswith('text/html'):
    htmlBytes = response.read()
    htmlString = htmlBytes.decode("utf-8")
    self.feed(htmlString)

第二个片段

for _ in range(max_retries):
    try:
        req = Request(i, headers={'User-Agent' :'Mozilla/5.0'})
        with urlopen(req,timeout=5) as response, open(aux, 'wb') as out_file:
            shutil.copyfileobj(response, out_file)  
        with open(path, fname), 'a') as f:
            f.write(("link" + str(intaux) + "-" + auxstr + str(index) + i[-4:] + " --- " + metadata[index%batch] + '\n'))
        break
    except error.URLError as err:
        print("URL that generated the error code: ", i)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", i)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", i)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", i)
        print("Error description: Socket error.")

错误:

Traceback (most recent call last):
  File "/mydir/crawler.py", line 202, in <module>
    spider("urls.txt", maxPages=0, debug=1, dailyRequests=9600) 
  File "/mydir/crawler.py", line 142, in spider
    parser.getLinks(url + "?start=" + str(currbot) + "&tab=" + auxstr,auxstr)
  File "/mydir/crawler.py", line 81, in getLinks
    htmlBytes = response.read()
  File "/usr/lib/python3.5/http/client.py", line 455, in read
    return self._readall_chunked()
  File "/usr/lib/python3.5/http/client.py", line 561, in _readall_chunked
    value.append(self._safe_read(chunk_left))
  File "/usr/lib/python3.5/http/client.py", line 607, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

编辑:

我注意到我错过了几行代码,感谢@tdelaney,我将它们添加到上面的代码中,并且我将发布我编写的解决方案,如果您发布解决方案或者如果您有更好的方法来解决它,我将标记答案正确的是

解决方案:

for _ in range(max_retries):
    try:
        req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
        response = urlopen(req,timeout=5)
        break
    except error.URLError as err: 
        print("URL that generated the error code: ", url)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", url)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", url)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", url)
        print("Error description: Socket error.")

if response.getheader('Content-Type').startswith('text/html'):
    for _ in range(max_retries):
        try:
            htmlBytes = response.read()
            htmlString = htmlBytes.decode("utf-8")
            self.feed(htmlString)
            break
        except error.URLError as err: 
            print("URL that generated the error code: ", url)
            print("Error description:",err.reason)
        except error.HTTPError as err:
            print("URL that generated the error code: ", url)
            print("Error code:", err.code)
            print("Error description:", err.reason)
        except socket.timeout:
            print("URL that generated the error code: ", url)
            print("Error description: No response.")
        except socket.error:
            print("URL that generated the error code: ", url)
            print("Error description: Socket error.")
python sockets exception timeout urllib
1个回答
0
投票

python“Requests”库使用自己的一组异常来处理与 HTTP 协议以及套接字相关的错误。它自动将从其嵌入的 socket() 函数返回的异常映射到 requests.exceptions 中定义的自定义异常。

所以由此引发的异常...

import Requests

try:
    req = Request("http://stackoverflow.com", headers={'User-Agent' :'Mozilla/5.0'})
    urlopen(req,timeout=5)
except Timeout:
    print "Session Timed Out!"

相当于由此引发的异常...

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
    s.connect(("127.0.0.1", 80))
except socket.timeout:
    print "Session Timed Out"

您的固定代码...

for _ in range(max_retries):
try:
    req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
    response = urlopen(req,timeout=5)
    break
except error.URLError as err: 
    print("URL that generated the error code: ", url)
    print("Error description:",err.reason)
except error.HTTPError as err:
    print("URL that generated the error code: ", url)
    print("Error code:", err.code)
    print("Error description:", err.reason)
except Timeout:
    print("URL that generated the error code: ", url)
    print("Error description: Session timed out.")
except ConnectionError:
    print("URL that generated the error code: ", url)
    print("Error description: Socket error timed out.")
© www.soinside.com 2019 - 2024. All rights reserved.