一个 ELK 节点宕机了

问题描述 投票:0回答:1

我的ELK集群由三个节点组成elk01,elk02,elk03。一个节点elk01突然宕机了。当我检查

elk01
的日志/var/log/elasticsearch/elasticsearch.log时,我发现了这些错误:

```"[elk01] [elastic] 的身份验证被领域 [保留] 终止 - 无法对用户 [elastic] 进行身份验证"

 
"[elk01] [kibana_system] 的身份验证被领域 [保留] 终止 - 失败验证用户 [kibana_system]"`

我进行了以下故障排除:

  • 我在所有节点中重新启动了elasticsearch服务。
  • 我可以在所有节点上的端口 92009300 上进行 telnet。
  • 我尝试使用此命令重置节点 elk01 上的弹性密码 /user/share/elasticsearch/bin/elasticsearch-重置密码-i -u elastic

但我收到以下错误:

Error: Failed to determine the health of the cluster.

我在所有节点的

xpack.security.enabled: false
中设置了
elasticsearch.yml
,重新启动elasticsearch,并再次尝试上述命令,但我也无法重置密码。

elk02elk03 节点,我可以使用

curl http://elk02:9200/_cat/indices?v
获取索引状态,并且所有索引都有 green 状态。

注意:集群运行良好。这个问题是在没有修改任何配置的情况下突然出现的。

更新了

/var/log/elasticsearch/elasticsearch.log
的内容elk01

2023-09-14T23:10:09,174][INFO ][o.e.c.c.JoinHelper       ] [elk01] failed to join {elk03}{lNN-V6I5S3mgyAOL1_taXg}{EM6lZBPcTpK4lqwdIq8udA}{elk03}{xx.xx.xx.224}{xx.xx.xx.224:9300}{cdfhilmrstw}{ml.allocated_processors_double=4.0, ml.machine_memory=16526131200, xpack.installed=true, ml.max_jvm_size=8262778880, ml.allocated_processors=4} with JoinRequest{sourceNode={elk01}{twbA5ovpSP-gAkeD65cmNg}{62VkkaWSR86H7-dlzTn1xg}{elk01}{xx.xx.xx.222}{xx.xx.xx.222:9300}{cdfhilmrstw}{ml.max_jvm_size=1073741824, ml.allocated_processors_double=8.0, xpack.installed=true, ml.machine_memory=16525099008, ml.allocated_processors=8}, minimumTerm=2609, optionalJoin=Optional[Join{term=2609, lastAcceptedTerm=10, lastAcceptedVersion=4237, sourceNode={elk01}{twbA5ovpSP-gAkeD65cmNg}{62VkkaWSR86H7-dlzTn1xg}{elk01}{xx.xx.xx.222}{xx.xx.xx.222:9300}{cdfhilmrstw}{ml.max_jvm_size=1073741824, ml.allocated_processors_double=8.0, xpack.installed=true, ml.machine_memory=16525099008, ml.allocated_processors=8}, targetNode={elk03}{lNN-V6I5S3mgyAOL1_taXg}{EM6lZBPcTpK4lqwdIq8udA}{elk03}{xx.xx.xx.224}{xx.xx.xx.224:9300}{cdfhilmrstw}{ml.allocated_processors_double=4.0, ml.machine_memory=16526131200, xpack.installed=true, ml.max_jvm_size=8262778880, ml.allocated_processors=4}}]} org.elasticsearch.transport.RemoteTransportException: [elk03][xx.xx.xx.224:9300][internal:cluster/coordination/join] Caused by: java.lang.IllegalStateException: index [.monitoring-es-7-2023.09.14/arNavb0JT92WnAajJMCyWQ] version not supported: 8.5.2 the node version is: 8.5.0 at org.elasticsearch.cluster.coordination.JoinTaskExecutor.ensureIndexCompatibility(JoinTaskExecutor.java:268) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.cluster.coordination.JoinTaskExecutor.lambda$addBuiltInJoinValidators$9(JoinTaskExecutor.java:341) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.cluster.coordination.Coordinator.lambda$validateJoinRequest$13(Coordinator.java:663) ~[elasticsearch-8.5.0.jar:?] at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?] at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1092) ~[?:?] at org.elasticsearch.cluster.coordination.Coordinator.validateJoinRequest(Coordinator.java:663) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.cluster.coordination.Coordinator$1.onResponse(Coordinator.java:609) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.cluster.coordination.Coordinator$1.onResponse(Coordinator.java:604) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.ClusterConnectionManager.lambda$connectToNodeOrRetry$1(ClusterConnectionManager.java:146) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:245) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListenerDirectly(ListenableFuture.java:113) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:100) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:131) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:139) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.ClusterConnectionManager.lambda$connectToNodeOrRetry$4(ClusterConnectionManager.java:253) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:162) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListener$RunAfterActionListener.onResponse(ActionListener.java:367) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:127) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.TransportService.lambda$handshake$6(TransportService.java:560) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:245) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1362) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1362) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:369) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.transport.InboundHandler$2.doRun(InboundHandler.java:361) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:892) ~[elasticsearch-8.5.0.jar:?] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.5.0.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?]

elasticsearch elastic-stack
1个回答
0
投票

如果您从字里行间看出,可以这么说,此日志消息表示:

[elk01] failed to join {elk03}... Caused by: ... index [.monitoring-es-7-2023.09.14...] version not supported: 8.5.2 the node version is: 8.5.0

发生的事情是有人将elk03(很可能是elk02)升级到v8.5.2,而将elk01留在8.5.0,它一直工作正常,直到elk03今天创建了新索引并且您重新启动了elk01。该索引与elk01的v8.5.0不兼容,因此不允许elk01加入集群。

解决方案:将elk01升级到v8.5.2并确保所有节点都具有相同的版本。

© www.soinside.com 2019 - 2024. All rights reserved.