hadoop程序

jm81lzqq 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(292)

这可能是一个基本的问题，但在MapReduce程序中，我想读取inputfolder中所有文件的名称，而不是内容，并且我想将这些文件的名称发送到我的mapper类。configuration conf=新配置（）；

Job job=new Job(conf,"Analysis");
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    //Path pa =new Path("hdfs://localhost:54310/home/aparajith");
    //pa.

    FileInputFormat.addInputPath(job,new Path("/hduser/"));
    FileOutputFormat.setOutputPath(job, new Path("/CrawlerOutput23/"));

    job.setJarByClass(mapper.Mapper1.class);

    job.setMapperClass(mapper.Mapper1.class);
    job.setReducerClass(mapper.Reducer1.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    System.exit(job.waitForCompletion(true) ? 0 : -1);

这是我的主课，我好像搞不懂。

Java hadoop mapreduce reducers Mapper

来源：https://stackoverflow.com/questions/16029856/hadoop-program-without-reading-contents-of-the-file

2条答案

按热度按时间

2ledvvac1#

如果希望文件名、键和值来自Map器：

在Map器中，您可以忽略传入的键和值（默认情况下，文件中的位置为 LongWritable 键和行内容 Text 值）并执行以下操作：

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
    // insert remaining mapper logic here
}

获取从中读取Map器中当前键和值的文件名。

如果只想将目录中的文件名作为Map器的输入：

您可以迭代输入目录中的文件( yourInputDirPath )然后写一个新文件包含他们的文件名( inputDirFilenamesPath )像这样：

FSDataOutputStream stream;
    try {
        stream = fs.create(inputDirFilenamesPath);
        RemoteIterator<LocatedFileStatus> it = fs.listFiles(yourInputDirPath, false);
        while (it.hasNext()) {
            stream.write(it.next().getPath().toString().getBytes());
            stream.write('\n');
        }
    } finally {
        stream.close();
    }

然后你可以简单地使用 FileInputFormat.addInputPath(job, inputDirFilenamesPath); 将此文件添加到mr作业的输入中。

赞(0）回复(0）举报 2021-06-03

oalqel3c2#

最简单的解决方案是将该目录中的所有文件名放在一个文件中，并将该文件作为作业的输入文件

赞(0）回复(0）举报 2021-06-03