我有kmeans的代码，我的任务是计算加速比，我在uni集群的不同节点上运行了它。但是有没有可能更改Map器和/或还原器的数量，以便在单个节点上运行时检查加速比的变化。
在谷歌搜索时，我发现 conf.setNumReduceTasks(2); 我可以改变减速机的数量。但我的输出没有任何变化(我的输出是时间（毫秒）。
我使用的代码来自github：https://github.com/himank/k-means/blob/master/src/kmeans.java 虽然我根据自己的需求做了一些改动，但主要功能还是一样的。
主要功能如下：

public static void main(String[] args) throws Exception {
    long startTime = System.currentTimeMillis();
    IN = args[0];
    OUT = args[1];
    String input = IN;
    String output = OUT + System.nanoTime();
    String again_input = output;
    int iteration = 0;
    boolean isdone = false;
    while (isdone == false) {
        JobConf conf = new JobConf(KMeans.class);
        if (iteration == 0) {
            Path hdfsPath = new Path(input + CENTROID_FILE_NAME);
            DistributedCache.addCacheFile(hdfsPath.toUri(), conf);
        } else {
            Path hdfsPath = new Path(again_input + OUTPUT_FILE_NAME);
            DistributedCache.addCacheFile(hdfsPath.toUri(), conf);
        }
        conf.setJobName(JOB_NAME);
        //conf.setNumReduceTasks(2);
        conf.setMapOutputKeyClass(DoubleWritable.class);
        conf.setMapOutputValueClass(DoubleWritable.class);
        conf.setOutputKeyClass(DoubleWritable.class);
        conf.setOutputValueClass(Text.class);
        conf.setMapperClass(Map.class);
        conf.setNumMapTasks(4);
        conf.setReducerClass(Reduce.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        FileInputFormat.setInputPaths(conf, new Path(input + DATA_FILE_NAME));
        FileOutputFormat.setOutputPath(conf, new Path(output));
        JobClient.runJob(conf);
        Path ofile = new Path(output + OUTPUT_FILE_NAME);   

        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://127.0.0.1:9000"), configuration);
        Path filePath = new Path(output + OUTPUT_FILE_NAME);
        BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(filePath)));
        List<Double> centers_next = new ArrayList<Double>();
        String line = br.readLine();
        while (line != null) {
            String[] sp = line.split("\t| ");
            double c = Double.parseDouble(sp[0]);
            centers_next.add(c);
            line = br.readLine();
        }
        br.close();
        String prev;
        if (iteration == 0) {
            prev = input + CENTROID_FILE_NAME;
        } else {
            prev = again_input + OUTPUT_FILE_NAME;
        }
        Path prevfile = new Path(prev);
        FileSystem fs1 = FileSystem.get(new URI("hdfs://127.0.0.1:9000"), configuration);
        BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1.open(prevfile)));
        List<Double> centers_prev = new ArrayList<Double>();
        String l = br1.readLine();
        while (l != null) {
            String[] sp1 = l.split(SPLITTER);
            double d = Double.parseDouble(sp1[0]);
            centers_prev.add(d);
            l = br1.readLine();
        }
        br1.close();
        Collections.sort(centers_next);
        Collections.sort(centers_prev);
        Iterator<Double> it = centers_prev.iterator();
        for (double d : centers_next) {
            double temp = it.next();
            if (Math.abs(temp - d) <= 0.1) {
                isdone = true;
            } else {
                isdone = false;
                break;
            }
        }
        ++iteration;
        again_input = output;
        output = OUT + System.nanoTime();
    }
    long endTime   = System.currentTimeMillis();
    long totalTime = endTime - startTime;
    System.out.println(totalTime);
}

另外，我是hadoop和mapreduce的新手。

2条答案

按热度按时间

twh00eeo1#

给定作业的Map数通常由输入文件中的输入拆分数驱动，而不是由setnummaptasks（）或mapred.map.tasks参数驱动。为每个输入拆分生成一个Map任务。mapred.map.tasks参数只是对Map数inputformat的一个提示。可以使用setnummaptasks（）手动增加map任务的数量，它可以用来增加map任务的数量，但不会将数量设置为低于hadoop通过拆分输入数据确定的数量。
http://wiki.apache.org/hadoop/howmanymapsandreduces

赞(0）回复(0）举报 2021-05-30

i2loujxw2#

ApacheMapReduce教程提供了更多信息。
有多少张Map？
Map的数量通常由输入的总大小驱动，即输入文件的总块数。
Map的正确并行级别似乎是每个节点10-100个Map，尽管已经为每个cpu的光照Map任务设置了300个Map。任务设置需要一段时间，因此最好至少花一分钟执行Map。
因此，如果您预期输入数据为10tb，块大小为128mb，那么最终将得到82000个Map，除非 Configuration.set(MRJobConfig.NUM_MAPS, int) （它只向框架提供一个提示）用于将其设置得更高。

java—是否可以在一个节点上运行多个Map程序

2条答案

相关问题

热门标签

最新问答