PostgreSQL 9.6处于恢复状态

问题描述 投票:0回答:1

有一台主服务器(9.6)和一个副本。一年多来,它一直运行良好。我们不得不在主服务器上重新配置CPU数量,却忘记将其应用到副本服务器上,这由于缺少已存档的pg_xlog文件而有效地使其损坏了(我们只保留了少量)。我们尝试照常重新初始化副本(使用VM + PostgreSQL重新安装的事件),但最终结果是:

FATAL: the database system is starting up.

主服务器和从服务器上均已通过验证的配置-看起来不错。它正在恢复连续的段,没有停留在这里。可用磁盘空间在主服务器和从服务器上均可用。无法通过psql连接到数据库。

[strace在该过程中将恢复段:

lseek(907, 0, SEEK_END)                 = 429228032
read(6, 0x7ffffb3dafc7, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(3, 11747328, SEEK_SET)            = 11747328
read(3, "\223\320\1\0\2\0\0\0\0@\263\272V\25\0\0y\v\0\0\0\0\0\0\0\210\20\33\0008@["..., 8192) = 8192
read(6, 0x7ffffb3dafc7, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(808, 0, SEEK_END)                 = 44916736
lseek(3, 11755520, SEEK_SET)            = 11755520
read(3, "\223\320\1\0\2\0\0\0\0`\263\272V\25\0\0\325\10\0\0\0\0\0\00025400041"..., 8192) = 8192
read(6, 0x7ffffb3dafc7, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(840, 0, SEEK_END)                 = 860676096
read(6, 0x7ffffb3dafc7, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(855, 0, SEEK_END)                 = 302235648
read(6, 0x7ffffb3dafc7, 1)              = -1 EAGAIN (Resource temporarily unavailable)

思想:

  • recovery.conf的权限(设置为600)
  • PostgreSQL不读取postgresql.conf(不太可能)
  • 网络问题
  • 主腐败
  • 主节点上的事务阻止从节点上的恢复

有什么想法吗?

复制副本初始化为:

pg_basebackup -D /var/lib/pgsql/9.6/data -P --xlog-method=stream -R --checkpoint=fast -U replica -W -h IP

pg_log遇到缺少pg_xlog文件并开始接受连接之前:

< 2019-10-24 14:02:52.930 CEST > FATAL:  the database system is shutting down
< 2019-10-24 14:03:32.169 CEST > LOG:  shutting down
< 2019-10-24 14:03:57.270 CEST > LOG:  database system is shut down
< 2019-10-24 14:04:55.694 CEST > LOG:  00000: database system was shut down in recovery at 2019-10-24 14:03:57 CEST
< 2019-10-24 14:04:55.694 CEST > LOCATION:  StartupXLOG, xlog.c:6060
< 2019-10-24 14:04:55.694 CEST > LOG:  00000: entering standby mode
< 2019-10-24 14:04:55.694 CEST > LOCATION:  StartupXLOG, xlog.c:6135
< 2019-10-24 14:04:55.712 CEST > LOG:  00000: redo starts at 1553/E2012790
< 2019-10-24 14:04:55.712 CEST > LOCATION:  StartupXLOG, xlog.c:6833
< 2019-10-24 14:06:02.320 CEST > FATAL:  57P03: the database system is starting up
< 2019-10-24 14:06:02.320 CEST > LOCATION:  ProcessStartupPacket, postmaster.c:2221
< 2019-10-24 14:12:43.461 CEST > LOG:  00000: received fast shutdown request
< 2019-10-24 14:12:43.461 CEST > LOCATION:  pmdie, postmaster.c:2679
< 2019-10-24 14:14:17.410 CEST > LOG:  00000: shutting down
< 2019-10-24 14:14:17.410 CEST > LOCATION:  ShutdownXLOG, xlog.c:8095
< 2019-10-24 14:15:41.730 CEST > LOG:  00000: database system is shut down
< 2019-10-24 14:15:41.730 CEST > LOCATION:  UnlinkLockFiles, miscinit.c:763
< 2019-10-24 14:17:13.492 CEST > LOG:  00000: database system was shut down in recovery at 2019-10-24 14:15:40 CEST
< 2019-10-24 14:17:13.492 CEST > LOCATION:  StartupXLOG, xlog.c:6060
< 2019-10-24 14:17:13.553 CEST > LOG:  00000: entering standby mode
< 2019-10-24 14:17:13.553 CEST > LOCATION:  StartupXLOG, xlog.c:6135
< 2019-10-24 14:17:15.654 CEST > LOG:  00000: redo starts at 1555/C0019B8
< 2019-10-24 14:17:15.654 CEST > LOCATION:  StartupXLOG, xlog.c:6833
< 2019-10-24 14:17:29.507 CEST > FATAL:  57P03: the database system is starting up
< 2019-10-24 14:17:29.507 CEST > LOCATION:  ProcessStartupPacket, postmaster.c:2221
< 2019-10-24 14:29:46.171 CEST > LOG:  00000: consistent recovery state reached at 1557/5A30CB8
< 2019-10-24 14:29:46.171 CEST > LOCATION:  CheckRecoveryConsistency, xlog.c:7647
< 2019-10-24 14:29:46.171 CEST > LOG:  00000: database system is ready to accept read only connections
< 2019-10-24 14:29:46.171 CEST > LOCATION:  sigusr1_handler, postmaster.c:5023
< 2019-10-24 14:29:46.386 CEST > LOG:  00000: started streaming WAL from primary at 1557/6000000 on timeline 2
< 2019-10-24 14:29:46.386 CEST > LOCATION:  WalReceiverMain, walreceiver.c:384

postgresql postgresql-9.6
1个回答
0
投票

最后,我们启动并运行它。我们要做的是:将未使用的数据移到单独的类似存档的数据库服务器上,在一些大表上执行vacuum full,并在几百个千兆字节的索引上执行reindex。我们还修改了effective_io_concurrency(增加),因为由于我们的混合磁盘配置(在主磁盘和副本磁盘上有所不同)可能会产生一些影响。现在,数据库不再需要等待数小时的等待日志,而是可以在一分钟内开始接受连接,并且复制工作正常。

© www.soinside.com 2019 - 2024. All rights reserved.