当flink作业失败时，报告flink作业完成并成功

hgc7kmma 于 2021-06-21 发布在 Flink

关注(0)|答案(1)|浏览(881)

我在yarn上运行flink job，我们在命令行中使用“fink run”将我们的作业提交给yarn，有一天我们在flink job上出现了一个异常，因为我们没有启用flink重启策略，所以它只是失败了，但最终我们从yarn应用程序列表中发现作业状态是“succeed”，我们预期是“failed”。
flink cli日志：

06/12/2018 03:13:37 FlatMap (getTagStorageMapper.flatMap)(23/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (ResultReducer.reduceGroup)(31/32) switched to CANCELED 
06/12/2018 03:13:37 FlatMap (SubClassEDFJoinMapper.flatMap)(29/32) switched to CANCELED 
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (SubClassInventoryMapper.flatMap)(27/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (OutputReducer.reduceGroup)(28/32) switched to CANCELED 
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (BIMBQMInstrumentMapper.flatMap)(27/32) switched to CANCELED 
06/12/2018 03:13:37 GroupReduce (BIMBQMGovCorpReduce.reduceGroup)(30/32) switched to CANCELED 
06/12/2018 03:13:37 FlatMap (BIMBQMEVMJoinMapper.flatMap)(32/32) switched to CANCELED 
06/12/2018 03:13:37 Job execution switched to status FAILED.
No JobSubmissionResult returned, please make sure you called ExecutionEnvironment.execute()
2018-06-12 03:13:37,625 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master
2018-06-12 03:13:37,625 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.
2018-06-12 03:13:37,630 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,632 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2018-06-12 03:13:37,633 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,634 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.
2018-06-12 03:13:37,635 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager.
2018-06-12 03:13:37,688 INFO  org.apache.flink.yarn.ApplicationClient                       - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345]
2018-06-12 03:13:38,648 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.
2018-06-12 03:13:39,480 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1528772982594_0001 finished with state FINISHED and final state SUCCEEDED at 1528773218662
2018-06-12 03:13:39,480 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down
2018-06-12 03:13:39,582 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.
2018-06-12 03:13:39,583 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager Actor[akka.tcp://flink@ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345].

flink作业管理器日志：

FlatMap (BIMBQMEVMJoinMapper.flatMap) (32/32) (67a002e07fe799c1624a471340c8cf9d) switched from CANCELING to CANCELED.
Try to restart or fail the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) if no longer possible.
Requesting new TaskManager container with 8192 megabytes memory. Pending requests: 1
Job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) switched from state FAILING to FAILED.
Could not restart the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) because the restart strategy prevented it.
Unregistered task manager ip-10-97-44-186/10.97.44.186. Number of registered task managers 31. Number of available slots 31
Stopping JobManager with final application status SUCCEEDED and diagnostics: Flink YARN Client requested shutdown
Shutting down cluster with status SUCCEEDED : Flink YARN Client requested shutdown
Unregistering application from the YARN Resource Manager
Waiting for application to be successfully unregistered.

有人能帮我理解为什么纱说我的Flink工作是“成功”的吗？

yarn apache-flink

来源：https://stackoverflow.com/questions/50854208/yarn-report-flink-job-as-finished-and-succeed-when-flink-job-failure