使用yarn客户端在google云上的hadoop中运行jar

我想使用yarn客户端在google云上运行hadoop中的jar。
我在hadoop的主节点中使用这个命令

spark-submit --class find --master yarn-client find.jar

但它返回这个错误

15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

有什么问题？如果有用的话，这是myyarn-site.xml

<?xml version="1.0" ?>
<!--
     <configuration>
      <!-- Site specific YARN configuration properties -->
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/yarn-logs/</value>
        <description>
          The remote path, on the default FS, to store logs.
        </description>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-m-on8g</value>
      </property>
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>5999</value>
        <description>

在你的情况下，看起来像Yarn资源经理可能是不健康的原因不明；您可以尝试用以下方法固定Yarn：

sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh

但是，看起来您正在使用单击部署解决方案；click to deploy的spark+hadoop2部署目前实际上不支持spark-on-yarn，原因是一些bug和内存配置不足。如果你只是试着用它来运行，你通常会遇到这样的事情 --master yarn-client 开箱即用：

15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: 0
   appStartTime: 1434561664937
   yarnAppState: RUNNING

15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

支持良好的部署方式是在google计算引擎上部署一个集群，配置hadoop2和spark以便能够在yarn上运行，就是使用bdutil。你会跑得像：

./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d  \
    -e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy

# Shorthand for logging in to the master

./bdutil -e my_custom_env.sh shell

# Handy way to run a socks proxy to make it easy to access the web UIs

./bdutil -e my_custom_env.sh socksproxy

# When done, delete your cluster

./bdutil -e my_custom_env.sh delete

与 spark_on_yarn_env.sh spark应默认为 yarn-client ，但您始终可以重新指定 --master yarn-client 如果你愿意的话。您可以在中看到有关可用标志的更详细说明 bdutil 与 ./bdutil --help . 以下是我上面包含的标志的帮助条目：

-b, --bucket
  Google Cloud Storage bucket used in deployment and by the cluster.

-d, --use_attached_pds
  If true, uses additional non-boot volumes, optionally creating them on
  deploy if they don't exist already and deleting them on cluster delete.

-e, --env_var_files
  Comma-separated list of bash files that are sourced to configure the cluster
  and installed software. Files are sourced in order with later files being
  sourced last. bdutil_env.sh is always sourced first. Flag arguments are
  set after all sourced files, but before the evaluate_late_variable_bindings
  method of bdutil_env.sh. see bdutil_env.sh for more information.

-P, --prefix
  Common prefix for cluster nodes.

-p, --project
  The Google Cloud Platform project to use to create the cluster.

-z, --zone
  Specify the Google Compute Engine zone to use.

使用yarn客户端在google云上的hadoop中运行jar

1条答案

相关问题

热门标签

最新问答