我如何解决ValueError:读取关闭的文件异常?

问题描述 投票:8回答:2

这个简单的Python 3脚本:

import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve(url, filename)

引发此异常:

Traceback (most recent call last):
  File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module>
    urllib.request.urlretrieve(url, filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

我认为这可能是暂时的问题,所以我添加了一些简单的异常处理,例如:

import random
import time
import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
while True:
    try:
        print("Downloading...")
        time.sleep(random.randint(0, 5))
        urllib.request.urlretrieve(url, filename)
        break
    except ValueError:
        pass

但是这只是无限打印Downloading...

python python-3.x urllib
2个回答
5
投票

您的URL返回403代码错误,并且urllib.request.urlretrieve显然不善于检测所有HTTP错误,因为它使用的是urllib.request.FancyURLopener,并且最近一次尝试通过返回urllib.request.FancyURLopener而不是引发错误来吞噬错误。] >

关于此修复程序,如果您仍然想使用urlinfo,则可以像这样覆盖FancyURLopener(附带的代码也显示错误):

urlretrieve

否则,这是i推荐

,您可以像这样使用import urllib.request from urllib.request import FancyURLopener class FixFancyURLOpener(FancyURLopener): def http_error_default(self, url, fp, errcode, errmsg, headers): if errcode == 403: raise ValueError("403") return super(FixFancyURLOpener, self).http_error_default( url, fp, errcode, errmsg, headers ) # Monkey Patch urllib.request.FancyURLopener = FixFancyURLOpener url = "http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" urllib.request.urlretrieve(url, "cite0.bib")
urllib.request.urlopen

0
投票

如果您通过托管的云基础架构或托管的安全服务运行应用程序,请检查可能来自这些限制的限制。发生在我身上。云提供商有时会在可访问的站点上强加白名单。


推荐问答