“ascii”编解码器无法使用 urlopen(req).read() 对位置 25 中的字符“\xeb”进行编码：序数不在范围(128) 中

Question

我正在尝试自动检索新闻文章的图像链接，我使用 getimage 函数编写了一个 python 模块 imageprocessor，该函数为新闻文章识别图像链接：

req = Request('http://top-channel.tv/artikull.php?id=264806&ref=fp', headers={'User-Agent': 'Mozilla/5.0'})
c = urlopen(req).read()
soup=BeautifulSoup(c)
m = soup.find('link',{'rel' : 'image_src'})
return m['href']

当我从 shell 运行时，它工作正常：

import imageprocessor
img=imageprocessor.getimage('http://top-channel.tv/artikull.php?id=264806&ref=fp','Top Channel')
img
'http://www.top-channel.tv/foto/lajme/ELBASA-NDERTIMET-07_17.jpg'

问题是当我尝试从views.py模块（Django框架）中以相同的方式调用此函数时，浏览器显示此错误消息：

UnicodeEncodeError at /fillimi/

'ascii' codec can't encode character '\xeb' in position 25: ordinal not in range(128)

我似乎 c = urlopen(req).read() 返回 assci 编码的字符串。我试过：

img=img.encode('utf-8')

但这并没有帮助。

Answer 1

似乎你必须先解码你的字符串。试试这个：

img = urllib.urlopen(link).read()
img = img.decode(<source encoding>)
img = unicode_str.encode("utf8")

一个例子可以是：

img= '\xa0'
img = img.decode("windows-1252")
img = img.encode("utf8")

Answer 2

遇到类似的错误。结果发现传入的字符串是用 DOS-Latin 编码的。显然，在您的情况下，它可能是任何其他外来/过时的编码。

“ascii”编解码器无法使用 urlopen(req).read() 对位置 25 中的字符“\xeb”进行编码：序数不在范围(128) 中

问题描述投票：0回答：2

2个回答

最新问题

“ascii”编解码器无法使用 urlopen(req).read() 对位置 25 中的字符“\xeb”进行编码：序数不在范围(128) 中

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2