我对hadoopmapreduce还不太熟悉。我正在进行一个项目,以读取以下数据文件:
[Event "Rated Classical game"]
[Site "https://lichess.org/j1dkb5dw"]
[White "BFG9k"]
[Black "mamalak"]
[Result "1-0"]
[UTCDate "2012.12.31"]
[UTCTime "23:01:03"]
[WhiteElo "1639"]
[BlackElo "1403"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-8"]
[ECO "C00"]
[Opening "French Defense: Normal Variation"]
[TimeControl "600+8"]
[Termination "Normal"]
1. e4 e6 2. d4 b6 3. a3 Bb7 4. Nc3 Nh6 5. Bxh6 gxh6 6. Be2 Qg5 7. Bg4 h5 8. Nf3 Qg6 9. Nh4 Qg5 10. Bxh5 Qxh4 11. Qf3 Kd8 12. Qxf7 Nc6 13. Qe8# 1-0
但是我不知道如何在mapper中读取多行,因为mapper通常每行读取一次。我试过使用nlineinputformat,但不太管用。
下面是我的驱动程序代码:(我只在mapper上尝试,所以我将reduce job设置为零)
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: AvgWordLength <input dir> <output dir>\n");
System.exit(-1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Project: Chess Analysis");
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.addInputPath(job, new Path(args[0]));
job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 18);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(ChessAnalysis.class);
job.setMapperClass(ChessMapper.class);
job.setReducerClass(ChessReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
}
我会感激你的帮助的,谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!