Cassandra在重新启动后未显示节点运行时间

问题描述 投票:0回答:1

我正在对运行Cassandra 2.1.9的4节点群集进行滚动重启。我通过“服务cassandra停止/启动”在节点1上停止并启动了Cassandra,并且在system.log或cassandra.log中都没有发现异常。从节点1执行“ nodetool状态”将显示所有四个节点。

user@node001=> nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns    Host ID                               Rack
UN  192.168.187.121  538.95 GB  256     ?       c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
UN  192.168.187.122  630.72 GB  256     ?       bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
UN  192.168.187.123  572.73 GB  256     ?       273df9f3-e496-4c65-a1f2-325ed288a992  rack1
UN  192.168.187.124  625.05 GB  256     ?       b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1

但是从其他节点执行相同的命令则显示节点1仍然处于故障状态。

user@node002=> nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns    Host ID                               Rack
DN  192.168.187.121  538.94 GB  256     ?       c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
UN  192.168.187.122  630.72 GB  256     ?       bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
UN  192.168.187.123  572.73 GB  256     ?       273df9f3-e496-4c65-a1f2-325ed288a992  rack1
UN  192.168.187.124  625.04 GB  256     ?       b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1

“ nodetool compactionstats”不显示任何挂起的任务,“ nodetool netstats”不显示任何异常。已经超过12个小时了,这些矛盾仍然存在。另一个示例是当我在重新启动的节点上执行“ nodetool gossipinfo”时,该节点的状态显示为正常:

user@node001=> nodetool -u gossipinfo
/192.168.187.121
  generation:1574364410
  heartbeat:209150
  NET_VERSION:8
  RACK:rack1
  STATUS:NORMAL,-104847506331695918
  RELEASE_VERSION:2.1.9
  SEVERITY:0.0
  LOAD:5.78684155614E11
  HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
  SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
  DC:datacenter1
  RPC_ADDRESS:192.168.185.121

与另一个节点,它显示node001的状态为“ shutdown”:

user@node002=> nodetool gossipinfo
/192.168.187.121
  generation:1491825076
  heartbeat:2147483647
  STATUS:shutdown,true
  RACK:rack1
  NET_VERSION:8
  LOAD:5.78679987693E11
  RELEASE_VERSION:2.1.9
  DC:datacenter1
  SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
  HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
  RPC_ADDRESS:192.168.185.121
  SEVERITY:0.0

我有什么办法可以纠正这种当前状况-这样我就可以继续滚动重启了吗?

cassandra cassandra-2.0
1个回答
1
投票

这是我最终为使“不良”节点重新进入群集并完成滚动重启而要做的:

执行完全关闭

nodetool disablethrift
nodetool disablebinary
sleep 5
nodetool disablegossip
nodetool drain
sleep 10
/sbin/service cassandra restart

节点返回的监视器

until echo "SELECT * FROM system.peers LIMIT 1;" | cqlsh `hostname` > /dev/null 2>&1; do echo "Node is still DOWN"; sleep 10; done && echo "Node is now UP"

从集群中删除重新启动的节点

从集群中的另一个节点,执行以下命令:

nodetool removenode <host-id>

执行第二次完全关闭

nodetool disablethrift
nodetool disablebinary
sleep 5
nodetool disablegossip
nodetool drain
sleep 10
/sbin/service cassandra restart

节点返回的监视器

until echo "SELECT * FROM system.peers LIMIT 1;" | cqlsh `hostname` > /dev/null 2>&1; do echo "Node is still DOWN"; sleep 10; done && echo "Node is now UP"

确认重新启动的节点已重新加入集群

从一个或多个其他节点拖尾/var/log/cassandra/system.log文件,查找以下消息:

INFO  [HANDSHAKE-/192.168.187.124] 2019-12-12 19:17:33,654 OutboundTcpConnection.java:485 - Handshaking version with /192.168.187.124
INFO  [GossipStage:1] 2019-12-12 19:18:23,212 Gossiper.java:1019 - Node /192.168.187.124 is now part of the cluster
INFO  [SharedPool-Worker-1] 2019-12-12 19:18:23,213 Gossiper.java:984 - InetAddress /192.168.187.124 is now UP

确认期望的节点数在集群中

以下命令的结果在所有节点上应相同:

nodetool status
© www.soinside.com 2019 - 2024. All rights reserved.