
3b6akqbq  于 2021-05-30  发布在  Hadoop


spark-submit --class find --master yarn-client find.jar


15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/ Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)


<?xml version="1.0" ?>
      <!-- Site specific YARN configuration properties -->
          The remote path, on the default FS, to store logs.



sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh

但是,看起来您正在使用单击部署解决方案;click to deploy的spark+hadoop2部署目前实际上不支持spark-on-yarn,原因是一些bug和内存配置不足。如果你只是试着用它来运行,你通常会遇到这样的事情 --master yarn-client 开箱即用:

15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: 0
   appStartTime: 1434561664937
   yarnAppState: RUNNING

15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}


./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d  \
    -e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy

# Shorthand for logging in to the master

./bdutil -e my_custom_env.sh shell

# Handy way to run a socks proxy to make it easy to access the web UIs

./bdutil -e my_custom_env.sh socksproxy

# When done, delete your cluster

./bdutil -e my_custom_env.sh delete

spark_on_yarn_env.sh spark应默认为 yarn-client ,但您始终可以重新指定 --master yarn-client 如果你愿意的话。您可以在中看到有关可用标志的更详细说明 bdutil./bdutil --help . 以下是我上面包含的标志的帮助条目:

-b, --bucket
  Google Cloud Storage bucket used in deployment and by the cluster.

-d, --use_attached_pds
  If true, uses additional non-boot volumes, optionally creating them on
  deploy if they don't exist already and deleting them on cluster delete.

-e, --env_var_files
  Comma-separated list of bash files that are sourced to configure the cluster
  and installed software. Files are sourced in order with later files being
  sourced last. bdutil_env.sh is always sourced first. Flag arguments are
  set after all sourced files, but before the evaluate_late_variable_bindings
  method of bdutil_env.sh. see bdutil_env.sh for more information.

-P, --prefix
  Common prefix for cluster nodes.

-p, --project
  The Google Cloud Platform project to use to create the cluster.

-z, --zone
  Specify the Google Compute Engine zone to use.
