SparkWorker不解析ecs上的主机,但可以处理ip地址

lmvvr0a8  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(433)

我正在尝试在ec2上的aws ecs上运行spark 3.0。我有spark worker服务和spark master服务。当我尝试用主机名运行worker(通过ecs服务发现公开)时,它无法解析。当我把硬编码的ip地址/端口,它的工作。
以下是我在ssh'ed到支持ecs的ec2之后在worker docker容器中运行的一些命令:


# as can be seen below, the master host is reachable from the worker Docker container

root@b87fad6a3ffa:/usr/spark-3.0.0# ping spark_master.mynamespace
PING spark_master.mynamespace (172.21.60.11) 56(84) bytes of data.
64 bytes from ip-172-21-60-11.eu-west-1.compute.internal (172.21.60.11): icmp_seq=1 ttl=254 time=0.370 ms

# the following works just fine -- starting the worker successfully and connecting to the master:

root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker spark://172.21.60.11:7077"

# !!! this is the fail

root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark_master.mynamespace:7077"
20/07/01 21:03:41 INFO worker.Worker: Started daemon with process name: 422@b87fad6a3ffa
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for TERM
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for HUP
20/07/01 21:03:41 INFO util.SignalUtils: Registered signal handler for INT
20/07/01 21:03:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/01 21:03:42 INFO spark.SecurityManager: Changing view acls to: root
20/07/01 21:03:42 INFO spark.SecurityManager: Changing modify acls to: root
20/07/01 21:03:42 INFO spark.SecurityManager: Changing view acls groups to:
20/07/01 21:03:42 INFO spark.SecurityManager: Changing modify acls groups to:
20/07/01 21:03:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/01 21:03:42 INFO util.Utils: Successfully started service 'sparkWorker' on port 39915.
20/07/01 21:03:42 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
org.apache.spark.SparkException: Invalid master URL: spark://spark_master.mynamespace:7077
    at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2397)
    at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47)
    at org.apache.spark.deploy.worker.Worker$.$anonfun$startRpcEnvAndEndpoint$3(Worker.scala:859)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
    at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:859)
    at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:828)
    at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
20/07/01 21:03:42 INFO util.ShutdownHookManager: Shutdown hook called

# following is just FYI

root@b87fad6a3ffa:/usr/spark-3.0.0# /bin/sh -c "bin/spark-class org.apache.spark.deploy.worker.Worker --help"
20/07/01 21:16:10 INFO worker.Worker: Started daemon with process name: 552@b87fad6a3ffa
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for TERM
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for HUP
20/07/01 21:16:10 INFO util.SignalUtils: Registered signal handler for INT
Usage: Worker [options] <master>

Master must be a URL of the form spark://hostname:port

Options:
  -c CORES, --cores CORES  Number of cores to use
  -m MEM, --memory MEM     Amount of memory to use (e.g. 1000M, 2G)
  -d DIR, --work-dir DIR   Directory to run apps in (default: SPARK_HOME/work)
  -i HOST, --ip IP         Hostname to listen on (deprecated, please use --host or -h)
  -h HOST, --host HOST     Hostname to listen on
  -p PORT, --port PORT     Port to listen on (default: random)
  --webui-port PORT        Port for web UI (default: 8081)
  --properties-file FILE   Path to a custom Spark properties file.
                           Default is conf/spark-defaults.conf.
...

主节点本身工作正常,我可以通过8080等看到它的管理用户界面。
你知道为什么spark不解析主机名而只处理ip地址吗?

4nkexdtk

4nkexdtk1#

问题是关于 _ 我在主机名中使用的。当我改变的时候 spark_master 以及 spark_worker 使用 - 相反,问题解决了。
相关链接:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6587184
uri-gethost返回null。为什么?
spark代码库中的相关代码段:

def extractHostPortFromSparkUrl(sparkUrl: String): (String, Int) = {
    try {
      val uri = new java.net.URI(sparkUrl)
      val host = uri.getHost
      val port = uri.getPort
      if (uri.getScheme != "spark" ||
        host == null ||
        port < 0 ||
        (uri.getPath != null && !uri.getPath.isEmpty) || // uri.getPath returns "" instead of null
        uri.getFragment != null ||
        uri.getQuery != null ||
        uri.getUserInfo != null) {
        throw new SparkException("Invalid master URL: " + sparkUrl)
      }
      (host, port)
    } catch {
      case e: java.net.URISyntaxException =>
        throw new SparkException("Invalid master URL: " + sparkUrl, e)
    }
  }

相关问题