环境:
OS: CentOS 7
Java: 1.8
Spark: 2.4.5
Hadoop: 2.7.7
Scala: 2.12.11
Hardware: 3 computers
我构建了一个简单的scala应用程序。我的代码是:
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordCount {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("WordCount")
val context: SparkContext = new SparkContext(conf)
val lines: RDD[String] = context.textFile(args(0))
val words: RDD[String] = lines.flatMap(_.split(" "))
val tuples: RDD[(String, Int)] = words.map((_, 1))
val sumed: RDD[(String, Int)] = tuples.reduceByKey(_ + _)
val sorted: RDD[(String, Int)] = sumed.sortBy(_._2, false)
sorted.saveAsTextFile(args(1))
context.stop()
}
}
文件build.sbt是:
name := "SparkScalaTest2"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"
我的目录布局如下所示:
$ find .
.
./build.sbt
./src
./src/main
./src/main/scala
./src/main/scala/wordCount.scala
然后我用 sbt package
要打包jar,请使用以下命令在spark上提交应用程序:
spark-submit \
--class wordCount \
--master spark://master:7077 \
--executor-memory 512M \
--total-executor-cores 3 \
/home/spark/IdeaProjects/SparkScalaTest2/target/scala-2.12/sparkscalatest2_2.12-0.1.jar \
hdfs://master:9000/data/text.txt \
hdfs://master:9000/result/wordCount
但是我在bash shell中发现了一个错误:
20/07/20 10:11:18 INFO spark.SparkContext: Created broadcast 0 from textFile at wordCount.scala:39
Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
at wordCount$.main(wordCount.scala:45)
at wordCount.main(wordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
... 14 more
Caused by: java.lang.ClassNotFoundException: scala.runtime.java8.JFunction2$mcIII$sp
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 14 more
为应用程序分配的执行器的stderr日志页显示以下错误:
20/07/20 10:11:48 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIG
我在网上搜索,发现scala的版本可能导致这个问题。但spark的官方网站(http://spark.apache.org/docs/2.4.5/index.html)说的是:
spark运行在Java8、Python2.7+/3.4+和R3.1+上。对于ScalaAPI,Spark2.4.5使用Scala2.12。您需要使用兼容的scala版本(2.12.x)。
我不知道为什么。无论如何,我试着按照sparkshell的建议安装scala-2.11.12并设置环境变量。提交应用程序后,bash shell中没有显示任何错误,但应用程序的已分配执行器的stderr日志页显示以下错误:
20/07/21 14:13:12 INFO executor.Executor: Finished task 1.0 in stage 4.0 (TID 7). 1459 bytes result sent to driver
20/07/21 14:13:12 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
20/07/21 14:13:12 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
tdown
尽管出现了错误,但应用程序似乎可以成功运行,这次我得到了正确的输出。当我查看项目结构时,我找到了目录 ./project/target/scala-2.12
以及 ./target/scala-2.11
. 为什么这两个目录建议使用不同版本的scala?这就是我问题的原因吗?我怎样才能解决这个问题?
暂无答案!
目前还没有任何答案,快来回答吧!