我按照说明在此处提出了网址请求:https://docs.python.org/3/howto/urllib2.html
但是无法使请求成功。
import urllib.request
import urllib.parse
URL = 'http://zhishi.me/api/entity/'
term_str = '瓦朗谢讷足球俱乐部'
encoded_url = urllib.parse.quote_plus(URL + term_str)
with urllib.request.urlopen(encoded_url) as response:
html = response.read()
html = html.decode('utf-8')
print(html)
这会产生以下错误:
Traceback (most recent call last):
File "/home/martin/nlp/baike/downloader/test.py", line 7, in <module>
with urllib.request.urlopen(encoded_url) as response:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 511, in open
req = Request(fullurl, data)
File "/usr/lib/python3.6/urllib/request.py", line 329, in __init__
self.full_url = url
File "/usr/lib/python3.6/urllib/request.py", line 355, in full_url
self._parse()
File "/usr/lib/python3.6/urllib/request.py", line 384, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'http%3A%2F%2Fzhishi.me%2Fapi%2Fentity%2F%E7%93%A6%E6%9C%97%E8%B0%A2%E8%AE%B7%E8%B6%B3%E7%90%83%E4%BF%B1%E4%B9%90%E9%83%A8'
我想出了纠正的方法:
base_url = 'http://zhishi.me/api/entity/'
para = urllib.parse.quote('瓦朗谢讷足球俱乐部')
url = urllib.parse.urljoin(base_url,para)
with urllib.request.urlopen(url) as response:
html = response.read()
html = html.decode('utf-8')
print(html)