这是一段网络挖掘脚本。
def printer(q,missing):
while 1:
tmpurl=q.get()
try:
image=urllib2.urlopen(tmpurl).read()
except httplib.HTTPException:
missing.put(tmpurl)
continue
wf=open(tmpurl[-35:]+".jpg","wb")
wf.write(image)
wf.close()
[q
是由Urls组成的Queue()
,而`missing是一个空队列以收集错误上升URL]
它由10个线程并行运行。
并且每次运行此命令,我都会得到。
File "C:\Python27\lib\socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "C:\Python27\lib\httplib.py", line 541, in read
return self._read_chunked(amt)
File "C:\Python27\lib\httplib.py", line 592, in _read_chunked
value.append(self._safe_read(amt))
File "C:\Python27\lib\httplib.py", line 649, in _safe_read
raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(5274 bytes read, 2918 more expected)
但是我确实使用except
...我尝试了其他类似的东西
httplib.IncompleteRead
urllib2.URLError
even,
image=urllib2.urlopen(tmpurl,timeout=999999).read()
但这些都不起作用。.>
如何捕捉IncompleteRead
和URLError
?
这是一段Web挖掘脚本。 def打印机(q,丢失):而1:tmpurl = q.get()尝试:image = urllib2.urlopen(tmpurl).read()除httplib.HTTPException:...
我认为此问题的正确答案取决于您认为的“引发错误的URL”。