向emr提交本地spark作业

rqqzpn5f  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(617)

我跟随amazon文档向emr集群提交spark作业https://aws.amazon.com/premiumsupport/knowledge-center/emr-submit-spark-job-remote-cluster/
按照说明进行操作后,使用frecuent疑难解答,由于地址未解析而失败,消息类似于。
error spark.sparkcontext:初始化sparkcontext时出错。java.lang.illegalargumentexception:java.net.unknownhostexception:ip-172-32-1-231.us-east-2.compute.internal位于org.apache.hadoop.security.securityutil.buildtokenservice(securityutil)。java:374)在org.apache.hadoop.hdfs.namenodeproxies.createnonhaproxy(namenodeproxies。java:310)在org.apache.hadoop.hdfs.namenodeproxies.createproxy(namenodeproxies。java:176)
当我看到它试图解析的ip是主节点1时,我在configurations文件(从主节点中的/etc/hadoop/conf目录获得的文件)中用sed将其更改为public。但是错误是连接到数据节点
info hdfs.dfsclient:createblockoutputstream org.apache.hadoop.net.connecttimeoutexception中出现异常:等待通道准备好连接时超时60000毫秒。ch:java.nio.channels.socketchannel[connection pending remote=/172.32.1.41:50010]位于org.apache.hadoop.net.netutils.connect(netutils)。java:533)在org.apache.hadoop.hdfs.dfsoutputstream.createsocketforpipeline(dfsoutputstream。java:1606)在org.apache.hadoop.hdfs.dfsoutputstream$datastreamer.createblockoutputstream(dfsoutputstream)上。java:1404)在org.apache.hadoop.hdfs.dfsoutputstream$datastreamer.nextblockoutputstream(dfsoutputstream)。java:1357)在org.apache.hadoop.hdfs.dfsoutputstream$datastreamer.run(dfsoutputstream。java:587)2008年2月19日13:54:58信息hdfs.dfsclient:放弃bp-1960505320-172.32.1.231-1549632479324:blk\ U 1073741907\ U 1086
最后,在上传资源文件时,我尝试了与createblockoutputstream中的question=spark hdfs exception相同的解决方案
它将向hdfs-site.xml文件添加以下内容:

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
</property>

但该错误仍然作为未解析的地址异常存在

19/02/08 13:58:06 WARN hdfs.DFSClient: DataStreamer Exception
java.nio.channels.UnresolvedAddressException
    at sun.nio.ch.Net.checkAddress(Net.java:101)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1606)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1404)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1357)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:587)

有人能帮我在本地机器上设置spark,让spark提交到远程emr吗?

ukqbszuj

ukqbszuj1#

除了遵循链接问题的答案外,还应该将工作节点的(公共)IP和(私有)dns添加到/etc/hosts文件中。

相关问题