spark调度程序池在yarn上运行时是如何工作的？

wmtdaxz3 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(300)

我有一个spark版本（1.6，2.0，2.1）的混合部署在yarn上（hadoop2.6.0/cdh5.5）。我试图保证某个应用程序永远不会缺少我们的yarn集群上的资源，而不管在那里运行的是什么。
我已经启用了shuffle服务并设置了一些fair调度程序池，如spark文档中所述。我为高优先级应用程序创建了一个独立的池，我希望它永远不会缺少资源，并对它进行了优化 minShare 资源配置：

<?xml version="1.0"?>
<allocations>
  <pool name="default">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>0</minShare>
  </pool>
  <pool name="high_priority">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>24</minShare>
  </pool>
</allocations>

当我在我们的yarn集群上运行spark应用程序时，我可以看到我配置的池是可识别的：

17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1

但是，我没有看到我的应用程序正在使用新的 high_priority 游泳池，即使我正在设置 spark.scheduler.pool 在我的召唤中 spark-submit . 因此，这意味着当集群被常规活动绑定时，我的高优先级应用程序无法获得所需的资源：

17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default
17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我错过了什么？我和我的同事试着在Yarn上实现先发制人，但没有起到任何作用。然后我们意识到在yarn中有一个非常类似于spark调度程序池的概念，叫做yarn队列。所以现在我们不确定这两个概念是否有冲突。
如何让我们的高优先级池按预期工作？spark调度程序池和Yarn队列之间是否存在某种冲突？

hadoop yarn apache-spark scheduling

来源：https://stackoverflow.com/questions/43239921/how-do-spark-scheduler-pools-work-when-running-on-yarn

1条答案

按热度按时间

pepwfjgg1#

spark用户列表上的某个人澄清了一些事情，解释了为什么我没有得到我所期望的结果：spark调度程序池用于管理应用程序内的资源，而yarn队列用于跨应用程序管理资源。我需要后者，却误用了前者。
这在“作业调度”下的spark文档中进行了解释。我只是被粗心的阅读加上“工作”这个词在spark技术意义上的混淆（即spark应用程序中的操作）和“工作”作为我的同事，我通常用它来表示提交给集群的申请。

赞(0）回复(0）举报 2021-05-29

我来回答

spark调度程序池在yarn上运行时是如何工作的？

1条答案

相关问题

热门标签

最新问答