scala—对象不可序列化(类:org.apache.spark.sparkcontext,值:org.apache.spark)sparkcontext@30d34649)

z31licg0  于 2021-07-09  发布在  Spark
关注(0)|答案(0)|浏览(259)

我需要你的帮助来解决我的问题:

def vectorizeReviews(dataPath: String, negReviewsPath: String ,spark: SparkSession): Map[String, RDD[SparkVector]] = { 
        val sc = spark.sparkContext 
        //TODO val stopWords = Source.fromFile(dataPath + "stop_words").getLines.toSet 
        val stopWords = Source.fromFile("/home/user/LB/TP_RCP216_6_tweets/stop_words").getLines.toSet
        // transmission des stop words aux noeuds de calcul val bStopWords = sc.broadcast(stopWords) 

        //TODO val w2vModel = Word2VecModel.load(sc, dataPath + "/w2vModel")
        val word2vec = new Word2Vec()
        val w2vModel = word2vec.fit(trainNegDataSeq)
        val vectors = w2vModel.getVectors.mapValues(vv => Vectors.dense(vv.map(_.toDouble))).map(identity) 
        val bVectors = sc.broadcast(vectors) 
        val vectSize = 100
        val negReviews = sc.textFile(negReviewsPath)
        //val posReviews = sc.textFile(posReviewsPath) 
        val negReviews2vec = negReviews.filter(sentence => sentence.length >= 1).map(sentence => sentence.toLowerCase.split("\\W+")
               ).map(wordSeq => {
                    var vSum = Vectors.zeros(vectSize)
                    var vNb = 0
                    wordSeq.foreach { word =>
                        if(!(bStopWords.value)(word) & (word.length >= 2)) {
                            bVectors.value.get(word).foreach { v =>
                                vSum = add(v, vSum)
                                vNb += 1
                            }
                        }
                    }
                    if (vNb != 0) {
                        vSum = scalarMultiply(1.0 / vNb, vSum)
                    }
                    vSum
                }).filter(vec => Vectors.norm(vec, 1.0) > 0.0).persist()

   val vectorizedReviewsMap = Map("NEG_REVIEWS" -> negReviews2vec)

   return vectorizedReviewsMap
}

调用函数时:
val trainvectreviewsmap=vectorizereviews(数据路径,trainnegreeviewspath,spark)
下面是错误:

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
    - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@30d34649)

当做

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题