在reudcer类中使用全局变量

qmelpv7a 于 2021-06-03 发布在 Hadoop

关注(0)|答案(3)|浏览(375)

我需要在我的mapreduce程序中使用全局变量如何在下面的代码中设置它并在reducer中使用全局变量。

public class tfidf
{
  public static tfidfMap..............
  {
  }
  public static tfidfReduce.............
  {
  }
  public static void main(String args[])
  {
       Configuration conf=new Configuration();
       conf.set("","");
  }

}

hadoop hdfs mapreduce reduce global-variables

来源：https://stackoverflow.com/questions/16222205/use-global-variable-in-reudcer-class

3条答案

按热度按时间

v64noz0r1#

模板代码可能看起来像这样（没有显示reducer，但它是相同的主体）

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ToolExample extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        Job job = new Job(getConf());
        Configuration conf = job.getConfiguration();

        conf.set("strProp", "value");
        conf.setInt("intProp", 123);
        conf.setBoolean("boolProp", true);

        // rest of your config here
        // ..

        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static class MyMapper extends
            Mapper<LongWritable, Text, LongWritable, Text> {
        private String strProp;
        private int intProp;
        private boolean boolProp;

        @Override
        protected void setup(Context context) throws IOException,
                InterruptedException {
            Configuration conf = context.getConfiguration();

            strProp = conf.get("strProp");
            intProp = conf.getInt("intProp", -1);
            boolProp = conf.getBoolean("boolProp", false);
        }
    }

    public static void main(String args[]) throws Exception {
        System.exit(ToolRunner.run(new ToolExample(), args));
    }
}

赞(0）回复(0）举报 2021-06-03

juud5qan2#

hadoop计数器（用户定义的）是另一种全局变量。作业完成后可以查看这些值。例如：如果您想计算输入中错误/良好记录的数量（由各种Map器/还原器处理），您可以使用计数器@莫：你可以根据自己的要求使用计数器

赞(0）回复(0）举报 2021-06-03

lmvvr0a83#

在集群（而不是本地）环境中，如果map/reduce程序是用java编写的（其他语言的单独进程），那么mapreduce程序将运行自己的jvm。这样，就无法在类中声明静态变量和值，也无法在mapreduce流中一路更改并在另一个jvm中期望值。共享对象是您所需要的，以便mapper/reduce可以设置和获取值。
实现这一点的方法很少。
正如chris提到的，使用configuration set（）/get（）方法将值传递给mapper和/或reducer。在这种情况下，必须在创建作业之前将值设置为配置对象。
使用hdfs文件写入数据并从mapper/reducer中读取。记得清理上面创建的hdfs文件。

赞(0）回复(0）举报 2021-06-03

我来回答

在reudcer类中使用全局变量

3条答案

相关问题

热门标签

最新问答