spark.yarn.jars-py4j.protocol.py4jerror:调用none.none时出错跟踪:

hc2pp10m  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(361)

我正在尝试使用spark2 submit-on命令运行spark作业。安装在集群上的spark的版本是cloudera的spark2.1.0,我正在使用conf spark.yarn.jars为版本2.4.0指定jars,如下所示-

spark2-submit \
 --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/virtualenv/path/bin/python \
 --conf spark.yarn.jars=hdfs:///some/path/spark24/*\
 --conf spark.yarn.maxAppAttempts=1\
 --conf spark.task.cpus=2\
 --executor-cores 2\
 --executor-memory 4g\
 --driver-memory 4g\
 --archives /virtualenv/path \
 --files /etc/hive/conf/hive-site.xml \
 --name my_app\
  test.py

这是我在test.py中的代码-

import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

print("Spark Session created")

在运行submit命令时,我看到如下消息-

yarn.Client: Source and destination file systems are the same. Not copying hdfs:///some/path/spark24/some.jar

然后我在创建spark会话的那一行得到了这个错误-

spark = SparkSession.builder.getOrCreate()
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 310, in getOrCreate
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 259, in _ensure_initialized
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 117, in launch_gateway
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 175, in java_import
  File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling None.None. Trace:
Authentication error: unexpected command.

错误中的py4j来自现有的spark,而不是我jar中的版本。我的spark24jar没捡起来吗?如果我删除jars的conf,同样的代码运行正常,但可能是从现有的sparkversion2.1.0中删除的。有什么线索可以帮你解决吗?谢谢。

xriantvc

xriantvc1#

问题是python从错误的地方运行。我必须这样从正确的地方屈服-
pythonpath=./${virtualenv}/venv/lib/python3.6/site-packages/spark2提交

相关问题