无法在运行时确定hadoop mapreduce的输出数据类型？

3b6akqbq 于 2021-06-02 发布在 Hadoop

关注(0)|答案(0)|浏览(161)

hadoop框架需要知道mapper和reducer的输出数据类型，以便在运行时创建这些类型的示例，以反序列化mapper和reducer之间的值，以及在将示例从reducer序列化到输出文件的过程中。因此，我们必须告诉hadoop框架关于job object中的输出数据，比如

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);

It is not possible to infer the types at runtime from the class definitions of the Mapper and Reducer since Java Generics uses Type Erasure.

假设这是一个Map器类

public static class SelectClauseMapper
    extends Mapper<LongWritable, Text, NullWritable, Text> {

    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
        if(!AirlineDataUtils.isHeader(value)){
            StringBuilder output = AirlineDataUtils.mergeStringArray(
                AirlineDataUtils.getSelectResultsPerRow(value),
                ",");
                context.write(NullWritable.get(),new Text(output.toString()));
        }
    }

有人能在上面这个例子的上下文中解释为什么不能在运行时确定输出类型吗？

hadoop mapreduce serialization Generics type-erasure

来源：https://stackoverflow.com/questions/30099591/determining-the-type-of-output-data-of-hadoop-mapreduce-at-runtime-not-possible