我需要你的帮助来解决我的问题:
def vectorizeReviews(dataPath: String, negReviewsPath: String ,spark: SparkSession): Map[String, RDD[SparkVector]] = {
val sc = spark.sparkContext
//TODO val stopWords = Source.fromFile(dataPath + "stop_words").getLines.toSet
val stopWords = Source.fromFile("/home/user/LB/TP_RCP216_6_tweets/stop_words").getLines.toSet
// transmission des stop words aux noeuds de calcul val bStopWords = sc.broadcast(stopWords)
//TODO val w2vModel = Word2VecModel.load(sc, dataPath + "/w2vModel")
val word2vec = new Word2Vec()
val w2vModel = word2vec.fit(trainNegDataSeq)
val vectors = w2vModel.getVectors.mapValues(vv => Vectors.dense(vv.map(_.toDouble))).map(identity)
val bVectors = sc.broadcast(vectors)
val vectSize = 100
val negReviews = sc.textFile(negReviewsPath)
//val posReviews = sc.textFile(posReviewsPath)
val negReviews2vec = negReviews.filter(sentence => sentence.length >= 1).map(sentence => sentence.toLowerCase.split("\\W+")
).map(wordSeq => {
var vSum = Vectors.zeros(vectSize)
var vNb = 0
wordSeq.foreach { word =>
if(!(bStopWords.value)(word) & (word.length >= 2)) {
bVectors.value.get(word).foreach { v =>
vSum = add(v, vSum)
vNb += 1
}
}
}
if (vNb != 0) {
vSum = scalarMultiply(1.0 / vNb, vSum)
}
vSum
}).filter(vec => Vectors.norm(vec, 1.0) > 0.0).persist()
val vectorizedReviewsMap = Map("NEG_REVIEWS" -> negReviews2vec)
return vectorizedReviewsMap
}
调用函数时:
val trainvectreviewsmap=vectorizereviews(数据路径,trainnegreeviewspath,spark)
下面是错误:
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@30d34649)
当做
暂无答案!
目前还没有任何答案,快来回答吧!