排序hadoopmapreduce：返回文本文件中单词的排序列表

sirbozc5 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(249)

所以我的任务是返回一个包含在文本文件中的所有单词的按字母顺序排序的列表，同时保留重复项。
{生存还是毁灭}−→ {做不做或做}
我的想法是把每一个词都当作关键，也当作价值。这样，由于hadoop对键进行排序，它们将自动按字母顺序进行排序。在reduce阶段，我只需将所有关键字相同的单词（因此基本上相同的单词）附加到一个文本值中。

public class WordSort {

   public static class Map extends Mapper<LongWritable, Text, Text, Text> {

   private Text word = new Text();

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
      String line = value.toString();
      StringTokenizer tokenizer = new StringTokenizer(line);
      while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        // transform to lower case
        String lower = word.toString().toLowerCase();
        context.write(new Text(lower), new Text(lower));
      }
    }
  }

  public static class Reduce extends Reducer<Text, Text, Text, Text> {

  public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      String result = "";
      for (Text value : values){
         res += value.toString() + " ";
      }
      context.write(key, new Text(result));
    }
  }

然而，我的问题是，如何简单地返回输出文件中的值？现在我有了这个：