bdr_init_copy无限期挂起

Question

Postgresql相当新，但必须设置复制。我决定使用BDR，它在本地演示中运行良好，但在分布式机器上它开始出现问题，主要是因为我不知道我到底在做什么，我自己也在为MySQL睡觉而烦恼。几乎，我已经让BDR在多台服务器上工作了。当我跑：

SELECT bdr.bdr_node_join_wait_for_ready();

它挂起的加入节点上。这在DB2和DB3上都会发生。 DB1返回有效的响应。研究这个我遇到了bdr_init_copy命令，它显然做了我手工做的所有事情，然后是一些。所以试了一下。现在，当我跑：

/usr/lib/postgresql/9.4/bin/bdr_init_copy -d "host=192.168.1.10 dbname=demo3" --local-dbname="host=192.168.1.23 dbname=demo3" -n db2 -D bdr_data

我明白了

bdr_init_copy: starting ...
Getting remote server identification ...
Detected 2 BDR database(s) on remote server
Updating BDR configuration on the remote node:
 demo2: creating replication slot ...
 demo2: creating node entry for local node ...
 demo3: creating replication slot ...
 demo3: creating node entry for local node ...
Creating base backup of the remote node...
63655/63655 kB (100%), 1/1 tablespace
Creating restore point on remote node ...
Bringing local node to the restore point ...

它坐在那里。我假设这两个问题的原因相同。据我所知，在本地节点（db2）上没有创建日志条目，但远程（db1）上存在以下内容

2016-10-12 22:38:43 UTC [20808-1] postgres@demo2 LOG:  logical decoding found consistent point at 0/5001F00
2016-10-12 22:38:43 UTC [20808-2] postgres@demo2 DETAIL:  There are no running transactions.
2016-10-12 22:38:43 UTC [20808-3] postgres@demo2 STATEMENT:  SELECT pg_create_logical_replication_slot('bdr_17163_6340711416785871202_2_17163__', 'bdr');
2016-10-12 22:38:43 UTC [20811-1] postgres@demo3 LOG:  logical decoding found consistent point at 0/5002090
2016-10-12 22:38:43 UTC [20811-2] postgres@demo3 DETAIL:  There are no running transactions.
2016-10-12 22:38:43 UTC [20811-3] postgres@demo3 STATEMENT:  SELECT pg_create_logical_replication_slot('bdr_17939_6340711416785871202_2_17939__', 'bdr');
2016-10-12 22:38:44 UTC [20812-1] postgres@demo3 LOG:  restore point "bdr_6340711416785871202" created at 0/50022A8
2016-10-12 22:38:44 UTC [20812-2] postgres@demo3 STATEMENT:  SELECT pg_create_restore_point('bdr_6340711416785871202')

有帮助吗？

Answer 1

是的，只是有这个问题，其他论坛都没有任何帮助。他们中的一些人甚至说新的节点将其状态报告为“o”并且其他节点将新服务器状态报告为“i”是可以的，因为“这只是一个错误而且很好”。这不可能。新服务器可以接收复制更新，但新服务器上无法进行主要更新。解决此问题的关键是在您加入的服务器（而不是新服务器）上启动日志记录。在新的服务器日志中，您可能会看到以下内容：08006: could not receive data from client: Connection reset by peer，它不是非常有帮助，并且会让您检查防火墙等。当他们有日志时，真正的资金来自源服务器日志，例如：no free replication state could be found for 11, increase max_replication_slots什么是可能发生的情况是，您的群集中有太多服务器用于默认设置，或者更有可能是旧主机遗留了一些垃圾。

你需要清理一下......在现有的集群中的每个服务器上（注意！）。首先获取现有集群上的事物列表：

select * from bdr.bdr_nodes order by node_sysid;

然后，检查以下内容：

select conn_sysid,conn_dboid from bdr.bdr_connections order by conn_sysid;

..如果您看到旧条目（不包含第一个查询中的node_sysid），则删除例如。 delete from bdr.bdr_connections where conn_sysid='<from-first-query>';

select * from pg_replication_slots order by slot_name;

..如果您看到不包含活动sysid的旧条目，则删除.. NB，使用该函数，不要执行“删除”，例如。 select pg_drop_replication_slot('bdr_17213_6574566740899221664_1_17213__');

select * from pg_replication_identifier order by riname;

..如果您看到不包含活动sysid的旧条目，则删除.. NB，使用该函数，不要执行“删除”

select pg_replication_identifier_drop('bdr_6443767151306784833_1_17210_17213_');

幸运的是，在每个节点上完成此操作后，您将看到新服务器的BDR状态转到“r”。在清理每个主机时，您应该注意到日志“08006：无法从客户端接收数据：连接由对等方重置”，与您刚刚清理的服务器的conn-sysid匹配，停止发生。祝好运

bdr_init_copy无限期挂起

问题描述投票：1回答：1

1个回答

最新问题

bdr_init_copy无限期挂起

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1