我们在 5 节点集群的节点 1 上遇到错误。从客户端角度来看,对节点 1 的查询似乎成功,但插入失败。尽管 autoinc 不应该参与更新查询,但我们还是看到了很多 autoinc 错误。此外,这似乎会导致性能问题,直到发生更高优先级的事务,使节点脱机以执行事务重放。以下是
error.log
中的一些条目,其中包含调试功能和设置演练。我们不知道如何进一步排除故障。
使事务继续进行的唯一方法是所有客户端删除并重建连接池。
设置的一些细节:
以下是一些错误:
150703 5:56:27 [Note] WSREP: DUPKEY error for autoinc
THD 5041, value 133622, off 2 inc 5
150703 5:56:27 [Note] WSREP: retrying insert: INSERT INTO `server_live` (server_id, performance_30, performance_120, performance_300, performance_600, players_online, staff_online, staff_last_seen, uptime, worlds_loaded, chunks_loaded, entities_loaded, tileEntities_loaded) VALUES (79, 100, 100, 99, 99, 2, '{}', staff_last_seen, 15568, 13, 789, 384, 1101) ON DUPLICATE KEY UPDATE performance_30 = 100, performance_120 = 100, performance_300 = 99, performance_600 = 99, players_online = 2, staff_online = '{}', staff_last_seen = staff_last_seen, uptime = 15568, worlds_loaded = 13, chunks_loaded = 789, entities_loaded = 384, tileEntities_loaded = 1101
150703 5:56:27 [Note] WSREP: innobase_commit, abort INSERT INTO `server_live` (server_id, performance_30, performance_120, performance_300, performance_600, players_online, staff_online, staff_last_seen, uptime, worlds_loaded, chunks_loaded, entities_loaded, tileEntities_loaded) VALUES (79, 100, 100, 99, 99, 2, '{}', staff_last_seen, 15568, 13, 789, 384, 1101) ON DUPLICATE KEY UPDATE performance_30 = 100, performance_120 = 100, performance_300 = 99, performance_600 = 99, players_online = 2, staff_online = '{}', staff_last_seen = staff_last_seen, uptime = 15568, worlds_loaded = 13, chunks_loaded = 789, entities_loaded = 384, tileEntities_loaded = 1101
150703 5:56:27 [Note] WSREP: cleanup transaction for LOCAL_STATE: INSERT INTO `server_live` (server_id, performance_30, performance_120, performance_300, performance_600, players_online, staff_online, staff_last_seen, uptime, worlds_loaded, chunks_loaded, entities_loaded, tileEntities_loaded) VALUES (79, 100, 100, 99, 99, 2, '{}', staff_last_seen, 15568, 13, 789, 384, 1101) ON DUPLICATE KEY UPDATE performance_30 = 100, performance_120 = 100, performance_300 = 99, performance_600 = 99, players_online = 2, staff_online = '{}', staff_last_seen = staff_last_seen, uptime = 15568, worlds_loaded = 13, chunks_loaded = 789, entities_loaded = 384, tileEntities_loaded = 1101
150703 5:56:27 [Note] WSREP: wsrep retrying AC query: INSERT INTO `server_live` (server_id, performance_30, performance_120, performance_300, performance_600, players_online, staff_online, staff_last_seen, uptime, worlds_loaded, chunks_loaded, entities_loaded, tileEntities_loaded) VALUES (79, 100, 100, 99, 99, 2, '{}', staff_last_seen, 15568, 13, 789, 384, 1101) ON DUPLICATE KEY UPDATE performance_30 = 100, performance_120 = 100, performance_300 = 99, performance_600 = 99, players_online = 2, staff_online = '{}', staff_last_seen = staff_last_seen, uptime = 15568, worlds_loaded = 13, chunks_loaded = 789, entities_loaded = 384, tileEntities_loaded = 1101
150703 5:56:27 [Note] WSREP: DUPKEY error for autoinc
THD 5041, value 133627, off 2 inc 5
150703 5:56:27 [Note] WSREP: releasing retry_query: conf 0 sent 0 kill 0 errno 0 SQL INSERT INTO `server_live` (server_id, performance_30, performance_120, performance_300, performance_600, players_online, staff_online, staff_last_seen, uptime, worlds_loaded, chunks_loaded, entities_loaded, tileEntities_loaded) VALUES (79, 100, 100, 99, 99, 2, '{}', staff_last_seen, 15568, 13, 789, 384, 1101) ON DUPLICATE KEY UPDATE performance_30 = 100, performance_120 = 100, performance_300 = 99, performance_600 = 99, players_online = 2, staff_online = '{}', staff_last_seen = staff_last_seen, uptime = 15568, worlds_loaded = 13, chunks_loaded = 789, entities_loaded = 384, tileEntities_loaded = 1101
[MYSQLD]
datadir=/data
log-error=/data/error.log
query_cache_size=0
binlog_format=ROW
query_cache_type=0
bind-address=0.0.0.0
port=3304
innodb_buffer_pool_size=2048M
innodb_flush_log_at_trx_commit=0
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=200
innodb_doublewrite=1
innodb_log_file_size=512M
innodb_log_buffer_size=64M
innodb_buffer_pool_instances=4
innodb_log_files_in_group=2
innodb_thread_concurrency=64
innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode=2
innodb_stats_on_metadata=0
default_storage_engine=innodb
binlog_format=ROW
key_buffer_size = 24M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 512M
skip_name_resolve
memlock=0
sysdate_is_now=1
max_connections=512
thread_cache_size=512
query_cache_type = 0
query_cache_size = 0
table_open_cache=1024
lower_case_table_names=0
wait_timeout = 28800
explicit_defaults_for_timestamp=1
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=2048M; evs.keepalive_period=PT3S; evs.inactive_check_period=PT10S; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT1M; evs.install_timeout=PT1M; evs.send_window=1024; evs.user_send_window=512;"
wsrep_cluster_name="<removed>"
wsrep_cluster_address="<removed>"
wsrep_slave_threads=4
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=1
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=10
wsrep_auto_increment_control=1
wsrep_replicate_myisam=1
wsrep_drupal_282555_workaround=1
wsrep_causal_reads=0
wsrep_sst_method=rsync
wsrep_log_conflicts=1
更新: 根据评论请求:
mysql> SHOW CREATE TABLE server_live;
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_live | CREATE TABLE `server_live` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`server_id` int(11) NOT NULL,
`performance_30` int(11) NOT NULL,
`performance_120` int(11) NOT NULL,
`performance_300` int(11) NOT NULL,
`performance_600` int(11) NOT NULL,
`players_online` int(11) NOT NULL,
`staff_online` varchar(255) NOT NULL DEFAULT '{}',
`staff_last_seen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`uptime` int(11) NOT NULL,
`worlds_loaded` int(11) NOT NULL,
`chunks_loaded` int(11) NOT NULL,
`entities_loaded` int(11) NOT NULL,
`tileEntities_loaded` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `server_id_2` (`server_id`),
CONSTRAINT `server_live_ibfk_1` FOREIGN KEY (`server_id`) REFERENCES `server` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=720312 DEFAULT CHARSET=utf8 |
+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
mysql> SHOW VARIABLES LIKE 'auto%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 5 |
| auto_increment_offset | 3 |
| autocommit | ON |
| automatic_sp_privileges | ON |
+--------------------------+-------+
检查网络连接:由于所有节点都通过 WAN 连接,请确保节点 1 与集群中其他节点之间不存在网络问题或延迟问题。高延迟或数据包丢失可能会导致事务复制和同步出现问题。
检查 Docker 配置:验证 Docker 在节点 1 和其他节点上配置是否正确。确保 Docker 网络设置正确,以允许在不同节点上运行的容器之间进行通信。检查 Docker 日志中是否存在任何可能表明容器网络或资源限制问题的错误或警告。
监控数据库性能:使用数据库监控工具分析节点 1 的性能并识别任何资源瓶颈,例如 CPU 使用率、内存使用率、磁盘 I/O 或网络吞吐量。查找可能影响数据库性能的资源利用率的任何峰值或异常。
检查 MySQL 配置:检查节点 1 上的 MySQL 配置,确保其针对可用硬件资源进行了优化。注意与事务处理、复制和自动增量行为相关的参数。确保集群中所有节点的自动增量设置一致。
检查复制延迟:监控节点 1 和集群中其他节点之间的复制状态。检查是否存在任何可能导致事务错误或数据同步延迟的复制滞后或不一致。如果检测到复制滞后,请调查潜在原因,例如网络问题或资源限制。
检查应用程序代码:检查与节点 1 交互的应用程序代码,以识别查询执行、错误处理或连接管理方面的任何潜在问题。确保事务得到正确处理,并且自动增量列不会被无意中更新或操纵。
数据库维护:执行数据库备份、索引优化和查询调优等日常维护任务,以确保最佳性能和数据完整性。检查是否有任何可能影响节点 1 上数据库操作的挂起架构更改或数据库迁移。
查阅数据库日志:查看节点 1 上的 MySQL 错误日志和查询日志,了解是否有任何错误消息或警告,这些消息或警告可以帮助您深入了解所遇到问题的根本原因。查找可能表明数据库操作存在潜在问题的任何重复出现的模式或异常。