执行以下代码时:
import mysql.connector
connection = mysql.connector.connect(...) # connection params here
cursor = connection.cursor()
cursor.execute('create table test_table(value blob)')
cursor.execute('insert into test_table values (_binary %s)', (np.random.sample(10000).astype('float').tobytes(),))
cursor.execute('select * from test_table')
cursor.fetchall()
我收到以下错误:
UnicodeDecodeError:'utf-8'编解码器无法解码位置1中的字节0xf7:无效的起始字节
(...然后我不认为在这里有用的堆栈跟踪)
似乎mysql连接器将我的blob转换为字符串(并且没有这样做)。如何在不进行任何转换的情况下将此数据作为字节获取?
显然,这是Python“mysql”模块的一个已知问题。尝试使用'pymysql'代替。
我们遇到了同样的问题,即BLOB被错误地读回为MySQL 8.0.13,mysql-connector-python 8.0.13和sqlalchemy 1.2.14的UTF-8字符串。
我们的诀窍是启用use_pure
option of MySQL Connector。 use_pure
的默认值在8.0.11中已更改,新默认值为使用C扩展。因此,我们挫败了选项:
create_engine(uri, connect_args={'use_pure': True}, ...)
我们的错误和堆栈跟踪的详细信息:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
....
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 272, in execute
self._handle_result(result)
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 163, in _handle_result
self._handle_resultset()
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 651, in _handle_resultset
self._rows = self._cnx.get_rows()[0]
File "/usr/local/lib/python3.6/site-packages/mysql/connector/connection_cext.py", line 273, in get_rows
row = self._cmysql.fetch_row()
SystemError: <built-in method fetch_row of _mysql_connector.MySQL object at 0x5627dcfdf9f0> returned a result with an error set
Traceback (most recent call last):
File "demo.py", line 16, in <module>
cursor.execute(query, ())
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte '0xff ... '
in position 0: invalid start byte
使用版本:
$ python --version
Python 2.7.10
>>> mysql.connector.__version__
'8.0.15'
用python代码
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mysql.connector
conn = mysql.connector.connect(
user='asdf',
password='asdf',
host='1.2.3.4',
database='the_db',
connect_timeout=10)
cursor = conn.cursor(buffered=True) #error is raised here
try:
query = ("SELECT data_blob FROM blog.cmd_table")
cursor.execute(query, ())
except mysql.connector.Error as err: #error is caught here
#error is caught here, and printed:
print(err) #printed thustly
使用由python的open(
填充的python变量“raw byte binary”,如下所示:
def read_file_as_blob(filename):
#r stands for read
#b stands for binary
with open(filename, 'rb') as f:
data = f.read()
return data
所以问题在于文件中数据的编码转换 - > mysql blob的数据编码 - >以及mysql如何提升该blob并将其转换回utf-8。
解决方案1正如AHalvar所述,设置use_pure=True
参数并传递给mysql.connector.connect( ... )
。然后神秘地它才起作用。但优秀的程序员会注意到,推迟神秘的咒语是一种糟糕的代码味道。由布朗运动修复引发技术债务。
解决方案2是尽早和经常对数据进行编码,并防止双重编码和双重数据解码,这是这些问题的根源。尽快将其锁定为通用编码格式。
对我来说,令人欣慰的解决方案是在此过程中更早地强制执行utf-8编码。在任何地方执行UTF-8。
data.encode('UTF-8')
unicode poo堆表示我对在不同操作系统和编码方案上的各种设备之间的字符编码的这种保姆的看法。