我正在尝试在twitter流中计算字数,并希望每30秒移动20分钟的窗口来计算字数。我将显示在一个字和前10个字计数的数据框的结果。但是,在下面的代码中,twitter流文件运行良好,但下面的代码不会产生任何结果:
# split each tweet into words
words = dataStream.flatMap(lambda line: line.split(" "))
# Count only the words
wordCounts = words.map(lambda w: (w, 1))
# adding the count of each word to its last count using updateStateByKey
word_totals = wordCounts.reduceByKeyAndWindow(aggregate_words_count, 600, 30)
word_totals.pprint()
# do the processing for each RDD generated in each interval
word_totals.foreachRDD(process_rdd)
# start the streaming computation
ssc.start()
# wait for the streaming to finish
ssc.awaitTermination()
我做错什么了?
谢谢
暂无答案!
目前还没有任何答案,快来回答吧!