cloudera中hadoop单词计数示例中的数字获取

ccgok5k5 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(325)

下面我们使用了代码：map类是wcmapper。reduce类是wcreducer。
不太清楚为什么输出是生成number而不是wordcount。

public class WCMapper extends Mapper { 
    public void map(LongWritable key,Text value,Context context) throws 
    IOException,InterruptedException 
       { String line = key.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line); 
          while(tokenizer.hasMoreTokens()) 
          { value.set(tokenizer.nextToken()); 
           context.write(value, new IntWritable(1)); 
            }
            }

       }

 public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException,InterruptedException
{
    int sum=0;
    for(IntWritable x: values)
    {
        sum+=x.get();

    }

    result.set(sum);
    System.out.println("Key: "+key+"Value: "+sum);
    context.write(key, result);

}
   }    

public static void main(String[] args) throws Exception{
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "WordCount");

    job.setJarByClass(WorCount.class);
    job.setMapperClass(WCMapper.class);
    job.setReducerClass(WCReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

     Path outputPath = new Path(args[1]);

     FileInputFormat.addInputPath(job, new Path(args[0]));
     FileOutputFormat.setOutputPath(job, new Path(args[1]));

     outputPath.getFileSystem(conf).delete(outputPath, true);

     System.exit(job.waitForCompletion(true)? 0: 1);
}

输入文件：这是cloudera这是smart
预期输出：这2是2 cloudera 1 smart 1
获得的输出：0 1 17 1

hadoop word-count cloudera

来源：https://stackoverflow.com/questions/58764337/getting-numbers-in-hadoop-word-count-example-in-cloudera

1条答案

按热度按时间

sh7euo9m1#

问题出在Map器中：

String line = key.toString();

这个 key 在这种情况下是 LongWritable 表示文件中行的字节偏移量。如果你把那行改成 value ，然后不要使用 value 下面你会得到正确的答案。
新Map器：

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    Text word = new Text();

    while(tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken()); 
        context.write(word, new IntWritable(1)); 
    }
}

赞(0）回复(0）举报 2021-05-27

我来回答

cloudera中hadoop单词计数示例中的数字获取

1条答案

相关问题

热门标签

最新问答