hadoop mapreduceMap器读取多行而不是一行

c90pui9n 于 2021-07-13 发布在 Hadoop

关注(0)|答案(0)|浏览(220)

我对hadoopmapreduce还不太熟悉。我正在进行一个项目，以读取以下数据文件：

[Event "Rated Classical game"]
[Site "https://lichess.org/j1dkb5dw"]
[White "BFG9k"]
[Black "mamalak"]
[Result "1-0"]
[UTCDate "2012.12.31"]
[UTCTime "23:01:03"]
[WhiteElo "1639"]
[BlackElo "1403"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-8"]
[ECO "C00"]
[Opening "French Defense: Normal Variation"]
[TimeControl "600+8"]
[Termination "Normal"]

1. e4 e6 2. d4 b6 3. a3 Bb7 4. Nc3 Nh6 5. Bxh6 gxh6 6. Be2 Qg5 7. Bg4 h5 8. Nf3 Qg6 9. Nh4 Qg5 10. Bxh5 Qxh4 11. Qf3 Kd8 12. Qxf7 Nc6 13. Qe8# 1-0

但是我不知道如何在mapper中读取多行，因为mapper通常每行读取一次。我试过使用nlineinputformat，但不太管用。
下面是我的驱动程序代码：（我只在mapper上尝试，所以我将reduce job设置为零）

public static void main(String[] args) throws Exception {

    if (args.length != 2) {
      System.out.printf("Usage: AvgWordLength <input dir> <output dir>\n");
      System.exit(-1);
    }

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Project: Chess Analysis");

    job.setInputFormatClass(NLineInputFormat.class);
    NLineInputFormat.addInputPath(job, new Path(args[0]));
    job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 18);
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setJarByClass(ChessAnalysis.class);
    job.setMapperClass(ChessMapper.class);
    job.setReducerClass(ChessReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class); 

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    boolean success = job.waitForCompletion(true);
    System.exit(success ? 0 : 1);
  }

我会感激你的帮助的，谢谢。

hadoop mapreduce

来源：https://stackoverflow.com/questions/66630323/hadoop-mapreduce-mapper-read-multiple-line-instead-of-one-line