mesos 1.2.1和1.3.1从属docker作业正常退出,但报告为失败

7hiiyaii  于 2021-06-26  发布在  Mesos
关注(0)|答案(0)|浏览(199)

我的mesos版本最近从0.28升级到了1.2.1。
正在使用chronos计划作业。我的docker作业正被正确调用,但仍在获取任务失败的错误事件,它已完成,退出状态为零。
这始终是可复制的。我尝试了多个mesos版本,但仍然没有成功。不知道我是否遗漏了什么。
操作系统详细信息:-

Kernel  - 3.8.13-98.7.1.el7uek
OS - OL 7.3
Sanpshot - 7-2017.6.4

由于它不是最新的内核,我在chronos job环境变量中添加了以下内容,因为我无法安装最新的docker版本。

{
          "name":"DOCKER_API_VERSION",
          "value":"1.22"
      }

启动带标志的“mesos docker executor”

'--container="mesos-81cb9c2a-d18b-4127-872b-2a5676dfb314-S0.97dc2c67-5d69-4a8c-b4e1-ba15807697cf" 
--docker="docker" 
--docker_socket="/var/run/docker.sock" 
--help="false" 
--initialize_driver_logging="true" 
--launcher_dir="/usr/libexec/mesos" --logbufsecs="0" 
--logging_level="INFO" 
--mapped_directory="/mnt/mesos/sandbox" 
--quiet="false" 
--sandbox_directory="/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/docker/links/97dc2c67-5d69-4a8c-b4e1-ba15807697cf" 
--stop_timeout="0ns"'

mesos从属日志:-

I0906 14:05:00.958442     9 slave.cpp:1625] Got assigned task 'ct:1504706700007:0:Job_Task_Test:' for framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:00.958544     9 slave.cpp:6386] Checkpointing FrameworkInfo to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/framework.info'
I0906 14:05:00.958868     9 slave.cpp:6397] Checkpointing framework pid 'scheduler-766fa517-8ca6-430e-b044-7fa7e9b339b8@20.426.45.305:43144' to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/framework.pid'
I0906 14:05:00.959430     9 slave.cpp:1785] Launching task 'ct:1504706700007:0:Job_Task_Test:' for framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:00.966035     9 paths.cpp:547] Trying to chown '/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf' to user 'root'
I0906 14:05:00.966223     9 slave.cpp:6903] Checkpointing ExecutorInfo to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/executor.info'
I0906 14:05:00.966648     9 slave.cpp:6479] Launching executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 with resources cpus(*)(allocated: *):0.1; mem(*)(allocated: *):32 in work directory '/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf'
I0906 14:05:00.967015     9 slave.cpp:6926] Checkpointing TaskInfo to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf/tasks/ct:1504706700007:0:Job_Task_Test:/task.info'
I0906 14:05:00.967300     9 slave.cpp:2118] Queued task 'ct:1504706700007:0:Job_Task_Test:' for executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:00.967337     9 slave.cpp:884] Successfully attached file '/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf'
I0906 14:05:00.967501    12 docker.cpp:1165] Starting container '97dc2c67-5d69-4a8c-b4e1-ba15807697cf' for task 'ct:1504706700007:0:Job_Task_Test:' (and executor 'ct:1504706700007:0:Job_Task_Test:') of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:00.968152     9 fetcher.cpp:353] Starting to fetch URIs for container: 97dc2c67-5d69-4a8c-b4e1-ba15807697cf, directory: /mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/docker/links/97dc2c67-5d69-4a8c-b4e1-ba15807697cf
I0906 14:05:00.969274    16 docker.cpp:1348] Running docker -H unix:///var/run/docker.sock pull docker.xyz.com/test/job-action:latest
I0906 14:05:01.165390     9 docker.cpp:1274] Running docker -H unix:///var/run/docker.sock inspect docker.xyz.com/test/job-action:latest
I0906 14:05:01.267083    13 docker.cpp:465] Docker pull docker.xyz.com/test/job-action:latest completed
I0906 14:05:01.267663    13 docker.cpp:1514] Launching 'mesos-docker-executor' with flags '--container="mesos-81cb9c2a-d18b-4127-872b-2a5676dfb314-S0.97dc2c67-5d69-4a8c-b4e1-ba15807697cf" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/usr/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/docker/links/97dc2c67-5d69-4a8c-b4e1-ba15807697cf" --stop_timeout="0ns"'
I0906 14:05:01.269182    13 docker.cpp:803] Checkpointing pid 147 to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf/pids/forked.pid'
I0906 14:05:01.308346    15 slave.cpp:3385] Got registration for executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.426.45.305:48379
I0906 14:05:01.308658    15 slave.cpp:3471] Checkpointing executor pid 'executor(1)@20.426.45.305:48379' to '/mesos-data/slave-1/meta/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf/pids/libprocess.pid'
I0906 14:05:01.309453    15 docker.cpp:1608] Ignoring updating container 97dc2c67-5d69-4a8c-b4e1-ba15807697cf because resources passed to update are identical to existing resources
I0906 14:05:01.309587    15 slave.cpp:2331] Sending queued task 'ct:1504706700007:0:Job_Task_Test:' to executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 at executor(1)@20.426.45.305:48379
I0906 14:05:02.914752    11 slave.cpp:3816] Handling status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.426.45.305:48379
I0906 14:05:02.915037    11 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:02.915063    11 status_update_manager.cpp:500] Creating StatusUpdate stream for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:02.915396    11 status_update_manager.cpp:832] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:02.915513    11 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to the agent
I0906 14:05:02.915587    11 slave.cpp:4256] Forwarding the update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to master@20.426.45.305:5050
I0906 14:05:02.915688    11 slave.cpp:4150] Status update manager successfully handled status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:02.915700    11 slave.cpp:4166] Sending acknowledgement for status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to executor(1)@20.426.45.305:48379
I0906 14:05:03.628334    10 status_update_manager.cpp:395] Received status update acknowledgement (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:03.628398    10 status_update_manager.cpp:832] Checkpointing ACK for status update TASK_RUNNING (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:03.628463    10 slave.cpp:3105] Status update manager successfully handled status update acknowledgement (UUID: 72a301e2-f3d3-4e24-8e43-fc5ee44b3730) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:04.256636    14 slave.cpp:5731] Querying resource estimator for oversubscribable resources
I0906 14:05:04.256691    14 slave.cpp:5745] Received oversubscribable resources {} from the resource estimator
I0906 14:05:04.944047    10 slave.cpp:4346] Received ping from slave-observer(1)@20.426.45.305:5050
I0906 14:05:11.049429    12 slave.cpp:3816] Handling status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 from executor(1)@20.426.45.305:48379
I0906 14:05:11.049636    12 docker.cpp:987] Running docker -H unix:///var/run/docker.sock inspect mesos-81cb9c2a-d18b-4127-872b-2a5676dfb314-S0.97dc2c67-5d69-4a8c-b4e1-ba15807697cf
I0906 14:05:11.136464    13 status_update_manager.cpp:323] Received status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.136502    13 status_update_manager.cpp:832] Checkpointing UPDATE for status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.136572    13 status_update_manager.cpp:377] Forwarding update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to the agent
I0906 14:05:11.136642    12 slave.cpp:4256] Forwarding the update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to master@20.426.45.305:5050
I0906 14:05:11.136739    12 slave.cpp:4150] Status update manager successfully handled status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.136752    12 slave.cpp:4166] Sending acknowledgement for status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 to executor(1)@20.426.45.305:48379
I0906 14:05:11.152230    13 status_update_manager.cpp:395] Received status update acknowledgement (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.152271    13 status_update_manager.cpp:832] Checkpointing ACK for status update TASK_FAILED (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.152307    13 status_update_manager.cpp:531] Cleaning up status update stream for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.152454    13 slave.cpp:3105] Status update manager successfully handled status update acknowledgement (UUID: 83eceb05-6a94-4660-bdb5-3cbc2b166b1b) for task ct:1504706700007:0:Job_Task_Test: of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:11.152475    13 slave.cpp:6882] Completing task ct:1504706700007:0:Job_Task_Test:
I0906 14:05:12.059929    13 slave.cpp:4388] Got exited event for executor(1)@20.426.45.305:48379
I0906 14:05:12.140215    14 docker.cpp:2397] Executor for container 97dc2c67-5d69-4a8c-b4e1-ba15807697cf has exited
I0906 14:05:12.140244    14 docker.cpp:2091] Destroying container 97dc2c67-5d69-4a8c-b4e1-ba15807697cf
I0906 14:05:12.140283    14 docker.cpp:2218] Running docker stop on container 97dc2c67-5d69-4a8c-b4e1-ba15807697cf
I0906 14:05:12.140717    11 slave.cpp:4768] Executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 exited with status 0
I0906 14:05:12.140748    11 slave.cpp:4868] Cleaning up executor 'ct:1504706700007:0:Job_Task_Test:' of framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000 at executor(1)@20.426.45.305:48379
I0906 14:05:12.141062    11 slave.cpp:4956] Cleaning up framework 5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000
I0906 14:05:12.141129    11 gc.cpp:55] Scheduling '/mesos-data/slave-1/slaves/81cb9c2a-d18b-4127-872b-2a5676dfb314-S0/frameworks/5175f6c9-0617-4145-ab46-3b7e64dc67ea-0000/executors/ct:1504706700007:0:Job_Task_Test:/runs/97dc2c67-5d69-4a8c-b4e1-ba15807697cf' for gc 6.99999836858667days in the future

执行人标准:-

I0906 13:25:02.165338  3399 exec.cpp:162] Version: 1.2.1
I0906 13:25:02.169983  3405 exec.cpp:237] Executor registered on agent f20ab78e-acd3-407a-b1b6-47d67a947eff-S1

mesos版本1.3.1:-

I0906 16:15:02.975457   580 exec.cpp:162] Version: 1.3.1
I0906 16:15:02.979501   586 exec.cpp:237] Executor registered on agent 920fa575-b534-48a8-8305-61f6097f2f49-S1
E0906 16:15:13.523074   589 process.cpp:956] Failed to accept socket: future discarded

我正在通过chronos安排 Docker 的工作。docker作业运行良好,我可以在mesos标准输出日志中看到以下内容。部分日志:-
执行人标准:-

Registered docker executor on <hostname>
Starting task ct:1504704300002:0:Job_Task_Test:
.....
.....
.....
: success: true
: Stopping HTTP executor
: Started Application in 6.734 seconds (JVM running for 7.642)
s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@bebdb06: startup date [Wed Sep 06 17:25:07 UTC 2017]; root of context hierarchy
2017-09-06 17:25:14.050 INFO 1 — [ Thread-1] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 0
2017-09-06 17:25:14.051 INFO 1 — [ Thread-1] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2017-09-06 17:25:14.051 DEBUG 1 — [ Thread-1] h.i.c.PoolingHttpClientConnectionManager : Connection manager is shutting down
2017-09-06 17:25:14.051 DEBUG 1 — [ Thread-1] h.i.c.DefaultManagedHttpClientConnection : http-outgoing-0: Close connection
2017-09-06 17:25:14.052 DEBUG 1 — [ Thread-1] h.i.c.PoolingHttpClientConnectionManager : Connection manager shut down

docker作业是一个java进程,它调用远程服务并最终通过调用system.exit(0)退出;我们在docker作业中打印以下内容。

Expected status: 200, Actual status: 200, success: true

我们发现任务执行没有问题。
通过执行以下命令,状态代码为“exited(0)2 minutes ago”。
Docker ps-a

CONTAINER ID        IMAGE                  COMMAND    CREATED      
                                                                                       STATUS                         PORTS               NAMES
789ec59e32442        docker-test:latest   "/bin/sh -c 'java -ja"   2 minutes ago       Exited (0) 2 minutes ago                           mesos-05fb536b-asdff-3d634c4ed860-S1.be003d0e-7701-4020-84be-234643565244

谢谢,

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题