UnicodeEncodeError AND TypeError：只能将str（不是“bytes”）连接到str

Question

我有一个问题，就是我尝试使用Google自定义搜索api搜索结果，但是当我搜索存储在varibale中的内容而不是手动编写它时，它会说UnicodeEncodeError：'ascii'codec无法编码位置104中的字符'\ xa2'：序数不在范围内（128）。当我解决它

    .encode('ascii', 'ignore').decode('ascii')

它会显示另一个错误，例如谷歌自定义搜索

    TypeError: can only concatenate str (not "bytes") to str.

PD：我也尝试过一些东西，比如str（）或.decode。

编辑：当然，存储在变量中的输入来自Pytesseract，它读取图像的文本。因此，我将此信息存储在变量中，然后我尝试在Google自定义搜索API中搜索此信息。因为它显示了一个Unicode错误，我在stackoverflow中查找了解决方案，我发现我可以尝试.decode变量以便不再出现这个问题。实际上这个问题已经解决但现在又出现了另一个问题，它是TypeError之一：只能将str（不是“bytes”）连接到str。所以，我不能使用.decode函数，因为它会显示anopther错误。我能做什么？

编辑2.0

text_photo = pytesseract.image_to_string(img2) #this will read the text and put it in a variable
text_photo = text_photo.replace('\r', '').replace('\n', '') #this will elimininate de /n


rawData = urllib.request.urlopen(url_google_1 + text_photo1 + '+' + text_photo2 + url_google_2).read()

url_google 1包含链接的第一部分（api key ...）用于谷歌搜索，第二部分包含我想从谷歌获得的内容。在中间我添加变量，因为它是我想要搜索的。如果我写你好是完美的工作问题是tesseract写的格式不兼容我试图使用str（text_photo）和.decode但不起作用json_data = json.loads（rawData）

Answer 1

我无法理解您具体问题的所有细节，但我确信根本原因如下：

Python 3区分了两种字符串类型，str和bytes，它们相似但不兼容。

一旦你理解了这意味着什么，他们每个人能做什么/不能做什么，以及如何从一个到另一个，我相信你可以弄清楚如何正确构建API调用的URL。

不同类型，不兼容：

>>> type('abc'), type(b'abc')
(<class 'str'>, <class 'bytes'>)

>>> 'abc' + b'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be str, not bytes

>>> b'abc' + 'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes

如果要组合它们，则需要将所有内容转换为相同类型。要进行转换，请将str编码为bytes，将bytes解码为str：

>>> 'abc'.encode()
b'abc'
>>> b'abc'.decode()
'abc'

str.encode和bytes.decode方法采用可选的encoding=参数，默认为UTF-8。此参数定义str中的字符与bytes对象中的八位字节之间的映射。如果使用给定的编码将字符映射到字节时出现问题，则会遇到UnicodeEncodeError。如果您使用未在给定映射中定义的字符，则会发生这种情况：

>>> '5 £'.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)

类似地，如果某些文本已使用编码X进行编码，但您尝试使用编码Y对其进行解码，则可能会看到UnicodeDecodeError：

>>> b = '5 £'.encode('utf8')
>>> b
b'5 \xc2\xa3'
>>> b.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2: ordinal not in range(128)

您可以使用errors="ignore"策略避免异常，但您将以这种方式丢失信息：

>>> '5 £'.encode('ascii', errors='ignore')
b'5 '

通常，如果您使用文本，则在任何地方都使用str。你也不应该经常需要直接使用.encode/.decode;经常文件处理程序等接受str并将它们转换为场景后面的bytes。

在你的情况下，你需要找出str和bytes混合的地点和原因，然后在连接之前确保所有类型都相同。

UnicodeEncodeError AND TypeError：只能将str（不是“bytes”）连接到str

问题描述投票：0回答：1

1个回答

最新问题

UnicodeEncodeError AND TypeError：只能将str（不是“bytes”）连接到str

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1