making spark使用/etc/hosts文件以集群模式绑定

t1qtbnec  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(412)

在有两个inet的机器上设置一个spark集群,一个公用,另一个专用。集群中的/etc/hosts文件具有集群中所有其他计算机的内部ip,如下所示。
内部ip fqdn
但是,当我在客户机模式下通过pyspark请求sparkcontext时( pyspark --master yarn --deploy-mode client ),akka绑定到公共ip,因此发生超时。

15/11/07 23:29:23 INFO Remoting: Starting remoting
15/11/07 23:29:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkYarnAM@public_ip:44015]
15/11/07 23:29:23 INFO util.Utils: Successfully started service 'sparkYarnAM' on port 44015.
15/11/07 23:29:23 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Failed to connect to driver at yarn_driver_public_ip:48875, retrying ...
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:427)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:293)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:149)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:574)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:572)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:599)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1446960366742_0002

从日志中可以看出,私有ip被完全忽略了,如何让yarn和spark使用hosts文件中指定的私有ip地址?
群集是使用ambari(hdp2.4)配置的

f45qwnt8

f45qwnt81#

这是目前spark中的一个问题,让spark绑定到正确接口的唯一方法是使用自定义名称服务器。
spark基本上进行主机名查找,并使用找到的ip地址与akka绑定。解决方法是创建自定义绑定区域并运行名称服务器。
https://issues.apache.org/jira/browse/spark-5113

y3bcpkx1

y3bcpkx12#

+问题是1。
spark使用akka进行通信。
所以这更像是一个阿克卡问题而不是Spark。
如果需要将网络接口绑定到其他地址,请使用akka.remote.netty.tcp.bind-hostname和akka.remote.netty.tcp.bind-port设置。
http://doc.akka.io/docs/akka/snapshot/additional/faq.html#why_are_replies_not_received_from_a_remote_actor_

相关问题