中层(dc / os)代理重新连接失败

问题描述 投票:4回答:1

我根据此指南https://dcos.io/docs/1.8/administration/installing/custom/cli/设置了小型测试中球群一切进展顺利。集群中只有3个节点,一个用于引导,一个主节点(10.7.1.12)和一个代理(10.7.1.13)。

但是在使用代理节点重新启动计算机后,主节点enter image description here不再可见。

/var/log/mesos/mesos-agent.log中,最后一个输入在重新启动之前具有时间戳。我正在尝试从https://dcos.io/docs/1.8/administration/installing/custom/troubleshooting/开始的所有步骤,但没有任何更改。

这里是代理断开连接后[sudo journalctl -u dcos-mesos-master)来自主服务器的日志

lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556001  2671 master.cpp:1245] Agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) disconnected
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556089  2671 master.cpp:2784] Disconnecting agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556170  2671 master.cpp:2803] Deactivating agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.926198  2670 master.cpp:5334] Shutting down agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) with message 'health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926230  2670 master.cpp:6617] Removing agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926507  2670 master.cpp:6910] Removing task 93f4b075-1338-4a84-afd6-6932cfe44c30 with resources mem(arangodb31, arangodb3):2048; cpus(arangodb31, arangodb3):0.25; disk(arangodb31, arangodb3)[AGENCY_991972e5-2d83-4710-ba3c-de8cf02303ab:myPersistentVolume]:2048; ports(arangodb31, arangodb3):[1026-1026] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926695  2670 master.cpp:6910] Removing task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb with resources mem(arangodb3, arangodb3):2048; cpus(arangodb3, arangodb3):0.25; disk(arangodb3, arangodb3)[AGENCY_0c76702f-ae8b-423c-83a8-1b6e2af8b723:myPersistentVolume]:2048; ports(arangodb3, arangodb3):[1025-1025] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928460  2670 master.cpp:6736] Removed agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928472  2670 master.cpp:5197] Sending status update TASK_LOST for task 93f4b075-1338-4a84-afd6-6932cfe44c30 of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928486  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at [email protected]:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928611  2670 master.cpp:5197] Sending status update TASK_LOST for task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928638  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at [email protected]:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928747  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at [email protected]:3583 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928761  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at [email protected]:3583
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928894  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at [email protected]:19866 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928905  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at [email protected]:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928921  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at [email protected]:25366 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928928  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at [email protected]:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928941  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0001 (metronome) at [email protected]:41077 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928963  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0000 (marathon) at [email protected]:43643 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering

其余期刊(journalctl ...)为空。

ZooKeeper日志enter image description here中也存在此错误

对于任何进一步调查的建议,我将不胜感激。

编辑:

我设法通过启动dcos-mesos-slave service手动运行它的代理节点(在必须启动dcos-spartandcos-gen-resolvconf服务之前)。有什么想法为什么不能自动启动?

mesos mesosphere dcos
1个回答
0
投票

关于为什么它没有自动启动的任何想法?

根据rules for using systemd reliably systemd单位彼此不依赖,因此您需要手动启动所有内容。

  • Requires=Wants=是不允许的。如果依赖的事物失败,则依赖它的事物将永远不会尝试再次启动。
  • Before=After=不建议使用。它们不是有力的保证,软件需要检查先决条件是否正常运行?
© www.soinside.com 2019 - 2024. All rights reserved.