map-reduce

xcitsw88 于 2021-05-27 发布在 Hadoop

关注(0)|答案(0)|浏览(177)

我在Map上做点什么。例如，我有一些数据集如下
节点id节点id权重
1 2 7
1 3 20
2 3 3
3 1 5
4 1 9
5 6 10
这意味着1号节点到2号节点的权重是7，以此类推。。。在最开始的时候，我想计算节点的数量，因此，我在map类中执行以下操作

public static class countMapper
    extends Mapper<Text,Text,Text,Text>{
    public void map(Text key, Text value, Context context)
        throws IOException, InterruptedException{
        context.write(key, value);
        context.write(new Text(value.toString().split(" ")[0]), new Text());
    }
}

另外，我将from节点和to节点各写一次，以计算节点数。

public static class countReducer
        extends Reducer<Text, Text, Text, Text>{

        private long numNodes = 0;

        public void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException{
            numNodes += 1;
        }

        public void cleanup(Context context)
            throws IOException, InterruptedException{
            context.getCounter(PageRank.myCounter.numNodes).setValue(numNodes);
        }
}

我在reducer类中声明节点。它似乎可以计算节点的数量。结果显示6！对的！！我不明白为什么它会起作用？map reduce不是一个处理这些问题的分布式系统。
每个jvm将运行它们的reduce类。似乎珠子不能正确计数。因为减速机类很多。每个reducer类不会相互通信。所以，我在这种情况下很困惑。我在AWSEC2上运行它，有2个数据节点和1个名称节点。它还显示了正确的节点数。
甚至我创建了一个有12000000个节点的文件来模拟大得多的文件，从而导致文件的分裂。另外，我将mapred.reduce.tasks=2设置为强制多个reducer。它仍然显示正确的答案。
有人能告诉我为什么吗？

Java hadoop mapreduce reducers

来源：https://stackoverflow.com/questions/65148204/map-reduce-in-hadoop-counting-problems