需要协助处理大型MySQL事务

问题描述 投票:0回答:1

我对这种事情有点陌生,因此我为自己的无知向您致歉。我和我的团队正在编写一种无监督的非参数方法来对流式元素进行聚类,并将特征向量存储在MySQL 5.7数据库中,以备后用。该算法将特征向量分解为几个部分(对于这个问题,我们做这个的详细原因并不重要),导致多个向量存储在数据库中,该数据表描述了输入的元素。这些向量的大小可能有所不同,但我相信最大的向量的大小约为1x60000。我们将每个特征向量存储在同一数据表中,以在我们的数据库设计中保留第4个范式。

问题是,当我们自动插入此向量时,我们在事务中遇到了问题,在这些事务中,锁持续了不寻常的时间。我认为这可能是一个僵局问题,但似乎无法弄清楚僵局在哪里。操作不应无限期地相互阻塞(操作顺序:将常规信息插入到信息数据表中->将组件1的特征向量插入到信息数据表中->将组件2的特征向量插入到信息数据表中-> ...)。请注意,即使其中一个事务阻塞,它也应最终解决,下一个事务应开始。这使我想到下一个假设,似乎每个单独的事务都太大,导致表锁定了大量时间。我听说,即使没有逻辑原因发生死锁,这也可能导致问题。以下是我在数据库docker容器中运行SHOW ENGINE INNODB STATUS;的结果。任何帮助或建议,将不胜感激。

[其他信息:我们在numpy中使用python3.7。每个特征向量都是一个numpy数组对象,该对象被字符串化并作为LONGBLOB存储在数据库中。由于底层算法,信息被​​顺序存储,并且还减少了db服务器上的负载(我们希望从一开始就避免这个问题)。我们已经考虑过彻底删除向量存储机制,但是,这样做会使以后的许多操作和算法显着更快(更不用说更容易编写)了,只需回忆起这些信息,而不是从原始数据库中重建向量有关输入项目的功能集的信息(想象一下,为每批训练集重新计算一个热矢量...不理想)。

------------
TRANSACTIONS
------------
Trx id counter 5848
Purge done for trx's n:o < 5800 undo n:o < 0 state: running but idle
History list length 66
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421274805708992, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805702552, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805704392, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805703472, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805701632, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805699792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805698872, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805694272, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805693352, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805692432, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805688752, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805687832, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805685072, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805684152, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805683232, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805682312, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805680472, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805678632, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805677712, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805676792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805675872, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805686912, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805697952, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805697032, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805696112, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805695192, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805691512, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805690592, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805689672, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805685992, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421274805681392, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 5845, ACTIVE 1960 sec
6 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 18460, OS thread handle 139799096174336, query id 110111 172.18.0.17 root
---TRANSACTION 5839, ACTIVE 3374 sec
1 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 18561, OS thread handle 139799094281984, query id 110572 172.18.0.20 root
Trx read view will not see trx with id >= 5840, sees < 5799
---TRANSACTION 5838, ACTIVE 3476 sec
1 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 18540, OS thread handle 139799094552320, query id 108421 172.18.0.20 root
Trx read view will not see trx with id >= 5839, sees < 5799
---TRANSACTION 5837, ACTIVE 3577 sec
1 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 18519, OS thread handle 139799095092992, query id 108310 172.18.0.20 root
Trx read view will not see trx with id >= 5838, sees < 5799
---TRANSACTION 5835, ACTIVE 3628 sec
1 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 18497, OS thread handle 139799094822656, query id 108175 172.18.0.20 root
Trx read view will not see trx with id >= 5835, sees < 5799
---TRANSACTION 5802, ACTIVE 3832 sec
1 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 11, OS thread handle 139799379732224, query id 107906 172.18.0.20 root
Trx read view will not see trx with id >= 5799, sees < 5796
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
501 OS file reads, 8143 OS file writes, 4776 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 2 buffer(s)
Hash table size 34673, node heap has 2 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 2 buffer(s)
Hash table size 34673, node heap has 2 buffer(s)
Hash table size 34673, node heap has 12 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 44613787
Log flushed up to   44613787
Pages flushed up to 44613787
Last checkpoint at  44613778
0 pending log flushes, 0 pending chkp writes
3412 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 924661
Buffer pool size   8191
Free buffers       5612
Database pages     2556
Old database pages 923
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 410, created 2146, written 3707
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2556, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
5 read views open inside InnoDB
Process ID=1, Main thread ID=139799528855296, state: sleeping
Number of rows inserted 187012, updated 52, deleted 4, read 102952428
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

以下是我们从自动向量插入系统中得到的错误。

score-service_1                 | ERROR:root:An error occurred.
score-service_1                 | Traceback (most recent call last):
score-service_1                 |   File "/usr/src/app/src/db/mysql.py", line 103, in execute
score-service_1                 |     self.cursor.execute(sql, parameters)
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 250, in execute
score-service_1                 |     self.errorhandler(self, exc, value)
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
score-service_1                 |     raise errorvalue
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 247, in execute
score-service_1                 |     res = self._query(query)
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 411, in _query
score-service_1                 |     rowcount = self._do_query(q)
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 374, in _do_query
score-service_1                 |     db.query(q)
score-service_1                 |   File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 277, in query
score-service_1                 |     _mysql.connection.query(self, query)
score-service_1                 | _mysql_exceptions.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

似乎没有任何表锁。本质上,我的问题是为什么会这样?关于我为什么遇到挂起交易的猜测是否正确?我可以使用任何诊断工具或方法来尝试进一步评估问题吗?我个人对SQL还是有点陌生​​。

mysql database database-design database-deadlocks
1个回答
0
投票

没有开始?它以autocommit = OFF运行吗?整个计划是一笔巨额交易吗?难怪它会超时。

“事务”应该是需要“原子”运行的一小段DB操作。也就是说,所有成功或全部失败,中间没有。

通常,交易是从类似BEGINSTART TRANSACTION的事物(接口具有不同的变体)开始的。

[当您在事务中包含大量耗时的客户端代码时,(来自其他客户端的)竞争事务可能会停滞,以等待对已锁定内容的访问。尝试避免这种情况。

某些设计模式如下:

1. get a thing
2. do a lot of time-consuming processing on that thing
3. release that thing

虽然很容易在步骤1中使用BEGIN,而在步骤3中使用COMMIT。但步骤2表示这样做不明智。改为

1a. BEGIN
1b. Find a thing to work on
1c. store (`UPDATE`) an indication that this client has grabbed the thing
1d. COMMIT
2. do a lot of time-consuming processing on that thing
3a. BEGIN
3b. Release (`UPDATE`) the thing
3c. COMMIT

此模式更具可伸缩性。

可能有可能将步骤1a-1d折叠为一个自动提交的语句。步骤3a-3c的同上。

因此,扩展了排队的任务并使工人远离彼此的头发。并且在避免“锁定等待超时”方面还有很长的路要走。

© www.soinside.com 2019 - 2024. All rights reserved.