hadoop按键分组出错

yebdmbv4 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(332)

这是我的密码

public class SJob {

 public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
{ 
    Job job = new Job();
    job.setJarByClass(SJob.class);
    job.setJobName("SJob");

    FileInputFormat.addInputPath(job, new Path("/home/WORK/input/data.csv"));
    FileOutputFormat.setOutputPath(job, new Path("/home/WORK/output"));

    job.setMapperClass(SMapper.class);
    job.setReducerClass(SReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.waitForCompletion(true);

 }

}

public class SMapper extends Mapper<LongWritable, Text, Text, Text>{

 @Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String line = value.toString();
    String parts[] = line.split(";");

    context.write(new Text(parts[0]), new Text(parts[1]));
 }

}

public class SReducer extends Reducer<Text, Text, Text, Text>{

 @Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

    String properties = "";

    int noOfElements = 0;

    for(Text value : values)
    {
        properties += value + " ";
        noOfElements++;
    }

    properties += "  " + noOfElements;

        context.write(key, new Text(properties));

 }

}

这是我的输入文件
1;一
2;一
三；一
4;一
1;b
2;b
三；b
4;b
1;c
2;c
三；c
4;c
这是我的输出文件
1个b c 2
2甲b丙3
3甲b丙3
4甲b丙3
一个一个
如您所见，按键分组执行得很差，输出应该是
1 a b c 3个
2甲b丙3
3甲b丙3
4甲b丙3
似乎在处理第一行时出现了问题，我尝试交换第一行和第二行，然后同样的事情发生了，在这种情况下，而不是
2甲b丙3
我明白了
2个b c 2
2甲1
原因是什么？

hadoop mapreduce

来源：https://stackoverflow.com/questions/27313213/hadoop-grouping-by-key-done-wrong

1条答案

按热度按时间

mpgws1up1#

我发现了问题。
由于某些限制，我使用了hadoop0.20.2。问题是hadoop中有一个bug已经在一些版本中解决了，但在我使用的版本中没有。
https://issues.apache.org/jira/browse/mapreduce-5777
此版本不能很好地处理utf-8文件。文件需要保存为utf-8，不带bom。
修正后一切正常。

赞(0）回复(0）举报 2021-05-30

我来回答

hadoop按键分组出错

1条答案

相关问题

热门标签

最新问答