如果未使用,mesos从属服务器将关闭

yi0zb3m4  于 2021-06-21  发布在  Mesos
关注(0)|答案(1)|浏览(299)

我有一个3主5从的Mesos系统。服务器可以很好地通信,选择了一个主服务器,而从服务器则可以顺利地连接。但是任何空闲并且没有运行应用程序的从机首先在主机上得到“运行状况检查失败”(我认为从机不会抱怨任何事情或失去连接),然后一段时间后主机抱怨“来自未知从机的状态更新”并终止从机。这发生在所有空闲的从机上,而那些有进程的从机继续工作而没有问题。
有人知道怎么解决这个问题吗?
附上奴隶日志的“摘录”。我试着清理一下

I0225 18:02:14.077440  9029 slave.cpp:3053] Current usage 60.93%. Max allowed age: 2.035008507120139days
I0225 18:02:28.615249  9025 slave.cpp:2088] Handling status update TASK_KILLED (UUID: id) for task develop.id of framework fwid from executor(1)@ip1:45193
W0225 18:02:28.615352  9025 slave.cpp:2121] Could not find the executor for status update TASK_KILLED (UUID: id) for task develop.id of framework fwid
I0225 18:02:28.615947  9031 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: id) for task develop.id of framework fwid
I0225 18:02:28.616165  9031 status_update_manager.cpp:373] Forwarding status update TASK_KILLED (UUID: id) for task develop.id of framework fwid to master@ip2:5050
I0225 18:02:28.616334  9031 slave.cpp:2252] Sending acknowledgement for status update TASK_KILLED (UUID: id) for task develop.id of framework fwid to executor(1)@ip1:45193
I0225 18:02:28.618074  9025 slave.cpp:508] Slave asked to shut down by master@ip2:5050 because 'Status update from unknown slave'
I0225 18:02:28.618239  9025 slave.cpp:1406] Asked to shut down framework fwid by master@ip2:5050
I0225 18:02:28.618273  9025 slave.cpp:1431] Shutting down framework fwid
I0225 18:02:28.618387  9025 slave.cpp:2878] Shutting down executor 'develop.id' of framework fwid
I0225 18:02:29.336168  9027 slave.cpp:2088] Handling status update TASK_KILLED (UUID: id) for task develop.id of framework fwid from executor(1)@ip1:42376
W0225 18:02:29.336278  9027 slave.cpp:2112] Ignoring status update TASK_KILLED (UUID: id) for task develop.id of framework fwid for terminating framework fwid
I0225 18:02:30.338100  9030 containerizer.cpp:997] Executor for container 'id' has exited
I0225 18:02:30.338213  9030 containerizer.cpp:882] Destroying container 'id'
I0225 18:02:30.343300  9025 slave.cpp:2596] Executor 'develop.id' of framework fwid exited with status 0
I0225 18:02:30.343474  9025 slave.cpp:2732] Cleaning up executor 'develop.id' of framework fwid
I0225 18:02:30.343935  9029 gc.cpp:56] Scheduling '/mnt/spark/mesos/slaves/S12/frameworks/fwid/executors/develop.id/runs/id' for gc 6.99999602148148days in the future
I0225 18:02:30.344023  9025 slave.cpp:2807] Cleaning up framework fwid
I0225 18:02:30.344100  9029 gc.cpp:56] Scheduling '/mnt/spark/mesos/slaves/S12/frameworks/fwid/executors/develop.id' for gc 6.9999960201037days in the future
I0225 18:02:30.344174  9029 gc.cpp:56] Scheduling '/mnt/spark/mesos/meta/slaves/S12/frameworks/fwid/executors/develop.id/runs/id' for gc 6.99999601960593days in the future
I0225 18:02:30.344216  9025 slave.cpp:466] Slave terminating
ijnw1ujt

ijnw1ujt1#

“healthcheck failed”消息意味着主机在过去一分半钟内无法ping slave(或者至少没有收到pong)。你有间歇性的网络问题吗?你试过从主人那里ping奴隶吗?端口5051(或您使用的任何端口)的从机上是否存在防火墙问题?

相关问题