如何使用spark hadoopfile方法使用值类型文本的自定义输入格式?例如 OmnitureDataFileInputFormat 用于处理omniture click流数据?
OmnitureDataFileInputFormat
jei2mxaa1#
import org.rassee.omniture.hadoop.mapred.OmnitureDataFileInputFormat import java.nio.charset.StandardCharsets import org.apache.hadoop.io.{LongWritable, Text} import org.apache.hadoop.mapred.InputFormat val rddLines: RDD[String] = sparkSession.sparkContext.hadoopFile( path = path, inputFormatClass = classOf[OmnitureDataFileInputFormat], keyClass = classOf[LongWritable], valueClass = classOf[Text] ) .map(_._2.copyBytes()).map(new String(_, StandardCharsets.UTF_8))
1条答案
按热度按时间jei2mxaa1#