mapreduce

ih99xse1 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(169)

我有一个5gb的文件。我正在运行一个简单的wordcount map reduce作业。块大小为128 mb。它是一个单节点集群。在阅读了120万份报告之后。它再次从同一个文件的开头开始读取。sudo代码如下。

Configuration objconf = new Configuration()
Path objInputPath = new Path("/home/abc/Desktop/Debug.csv")
Path objoutPath = new Path("/home/abc/Desktop/Outpath.csv")
Job objJob = new Job(objconf, "WordCount")
FileInputFormat.setInputPaths(objJob, objInputPath)
FileOutputFormat.setOutputPath(objJob, objoutPath)
objJob.setJarByClass(WordCount.class)
objJob.setMapperClass(WCMapper.class)
objJob.setJobName("WordCount")
objJob.setInputFormatClass(TextInputFormat.class)
objJob.setOutputFormatClass(TextOutputFormat.class)
int j = objJob.waitForCompletion(true) ? 0 : 1

Mapper.java
private IntWritable one = new IntWritable(1)
private Text word = new Text()
String line = value.toString()

Java hadoop mapreduce large-files

来源：https://stackoverflow.com/questions/56088329/mapreduce-for-5-gb-files