我使用的是cloudera quickstart vm cdh5.3.0(就包裹包而言)和spark 1.2.0 $SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark
使用该命令提交spark应用程序 ./bin/spark-submit --class <Spark_App_Main_Class_Name> --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/<Spark_App_Target_Jar_Name>.jar
spark\应用程序\主\类\名称.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.util.MLUtils
object Spark_App_Main_Class_Name {
def main(args: Array[String]) {
val hConf = new SparkConf()
.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
val sc = new SparkContext(hConf)
val data = MLUtils.loadLibSVMFile(sc, "hdfs://localhost.localdomain:8020/analytics/data/mllib/sample_libsvm_data.txt")
...
}
}
但我得到了 ClassNotFoundException
为了 org.apache.hadoop.hdfs.DistributedFileSystem
以客户端模式提交应用程序时
[cloudera@localhost bin]$ ./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar
15/11/30 09:46:34 INFO SparkContext: Spark configuration:
spark.app.name=Spark_App_Main_Class_Name
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.eventLog.dir=hdfs://localhost.localdomain:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.executor.memory=4G
spark.jars=file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar
spark.logConf=true
spark.master=spark://localhost.localdomain:7077
spark.yarn.historyServer.address=http://localhost.localdomain:18088
15/11/30 09:46:34 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.113.234.150 instead (on interface eth12)
15/11/30 09:46:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/11/30 09:46:34 INFO SecurityManager: Changing view acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: Changing modify acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
15/11/30 09:46:35 INFO Slf4jLogger: Slf4jLogger started
15/11/30 09:46:35 INFO Remoting: Starting remoting
15/11/30 09:46:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.113.234.150:59473]
15/11/30 09:46:35 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@10.113.234.150:59473]
15/11/30 09:46:35 INFO Utils: Successfully started service 'sparkDriver' on port 59473.
15/11/30 09:46:36 INFO SparkEnv: Registering MapOutputTracker
15/11/30 09:46:36 INFO SparkEnv: Registering BlockManagerMaster
15/11/30 09:46:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20151130094636-8c3d
15/11/30 09:46:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/11/30 09:46:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7d1f2861-a568-4919-8f7e-9a9fe6aab2b4
15/11/30 09:46:38 INFO HttpServer: Starting HTTP Server
15/11/30 09:46:38 INFO Utils: Successfully started service 'HTTP file server' on port 50003.
15/11/30 09:46:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/11/30 09:46:38 INFO SparkUI: Started SparkUI at http://10.113.234.150:4040
15/11/30 09:46:39 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar at http://10.113.234.150:50003/jars/Spark_App_Target_Jar_Name.jar with timestamp 1448894799228
15/11/30 09:46:39 INFO AppClient$ClientActor: Connecting to master spark://localhost.localdomain:7077...
15/11/30 09:46:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151130094640-0000
15/11/30 09:46:41 INFO NettyBlockTransferService: Server created on 56458
15/11/30 09:46:41 INFO BlockManagerMaster: Trying to register BlockManager
15/11/30 09:46:41 INFO BlockManagerMasterActor: Registering block manager 10.113.234.150:56458 with 267.3 MB RAM, BlockManagerId(<driver>, 10.113.234.150, 56458)
15/11/30 09:46:41 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
at Spark_App_Main_Class_Name$.main(Spark_App_Main_Class_Name.scala:22)
at Spark_App_Main_Class_Name.main(Spark_App_Main_Class_Name.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
... 16 more
spark应用程序似乎无法Maphdfs,因为最初我遇到错误:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
at LogisticRegressionwithBFGS$.main(LogisticRegressionwithBFGS.scala:21)
at LogisticRegressionwithBFGS.main(LogisticRegressionwithBFGS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我遵循hadoop no filesystem for scheme:file将“fs.hdfs.impl”和“fs.file.impl”添加到spark配置设置中
3条答案
按热度按时间bksxznpy1#
当我从ide运行spark代码并访问远程hdfs时,我也面临同样的问题。
所以我设置了下面的配置,解决了这个问题。
bfhwhh0e2#
在类路径中需要hadoop-hdfs-2.xjars(maven链接)。在提交应用程序时,请使用spark submit的--jar选项提及额外的jar位置。
另一方面,理想情况下,您应该迁移到具有spark1.5的cdh5.5。
c2e8gylq3#
我已经通过了一些详细的搜索,通过这个问题,并做了不同的试验方法。基本上,这个问题似乎是由于hadoop hdfs jar不可用造成的,但是在提交spark应用程序时,即使在使用之后也找不到依赖jar
maven-assembly-plugin
或者maven-jar-plugin
/maven-dependency-plugin
在maven-jar-plugin
/maven-dependency-plugin
组合后,主类jar和依赖jar将被创建,但仍然为依赖jar提供--jar
选项导致如下相同的错误使用
maven-shade-plugin
正如hadoop中所建议的那样,“krookedking”的scheme文件的文件系统似乎没有在正确的点上碰到问题,因为创建一个包含主类和所有依赖类的jar文件消除了类路径问题。我的最后一个工作命令如下:
这个
maven-shade-plugin
在我的项目pom.xml中如下所示:注意:过滤器中的排除项将使您能够摆脱