spark提交给kubernetes:不是由执行者拉取的包

idv4meu8  于 2021-07-13  发布在  Spark
关注(0)|答案(0)|浏览(223)

赏金三天后到期。回答此问题可获得+50声望奖励。hhk想引起更多的注意**这个问题。

我正在尝试使用spark submit将我的pyspark应用程序提交到kubernetes集群(minikube):

./bin/spark-submit \
   --master k8s://https://192.168.64.4:8443 \
   --deploy-mode cluster \
   --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 \
   --conf spark.kubernetes.container.image='pyspark:dev' \
   --conf spark.kubernetes.container.image.pullPolicy='Never' \
   local:///main.py

应用程序试图访问部署在集群中的kafka示例,因此我指定了jar依赖关系:

--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1

我使用的容器映像基于我用实用程序脚本构建的容器映像。我已经打包了我的应用程序需要的所有python依赖项。
驱动程序正确地部署并获取kafka包(如果需要,我可以提供日志),并在新的pod中启动executor。
但是执行者舱坠毁了:

ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaBatchInputPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2160)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:407)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

因此,我对executor pod进行了调查,发现jar不在$spark\u classpath文件夹中(设置为“:/opt/spark/jars/*”)
在构建docker映像时,是否还需要在spark jars文件夹中获取并包含依赖关系(我认为'--packages'选项还将使执行器检索指定的jar)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题