我试图理解hadoop框架及其map reduce功能。
我使用的环境:Windows7和运行Hadoop0.19.1的cygwin。EclipseEuropa for MapReduceJobs开发。
面临的问题:
使用了默认Map器和reducer标识类的示例代码。目标是将输入文件夹中的文件复制到输出文件夹,而不需要对数据进行任何处理
testdriver.java文件包含:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class TestDriver {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(TestDriver.class);
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path("src"));
//conf.setOutputPath(new Path("out"));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("In"));
FileOutputFormat.setOutputPath(conf, new Path("Out"));
// TODO: specify a mapper
conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
// TODO: specify a reducer
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
hadoop集群是使用cygwin终端启动的。倾倒 jps
命令:
$ jps
4944 NameNode
6588 SecondaryNameNode
8504 TaskTracker
8640 JobTracker
8340 DataNode
8568 Jps
hadoop-site.xml文件包含以下内容:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
我的hadoop安装路径中没有通过cygwin的yarn-site.xml文件,它是必需的吗?这会导致任何问题吗?
在eclipse中,使用map/reduce主端口9101和dfs主端口9100创建了一个hadoop map/reduce位置。运行程序时,控制台上出现以下数据:
16/07/14 12:06:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/07/14 12:06:35 INFO mapred.FileInputFormat: Total input paths to process : 4
16/07/14 12:06:37 INFO mapred.JobClient: Running job: job_201607141149_0002
16/07/14 12:06:38 INFO mapred.JobClient: map 0% reduce 0%
我在cygwin的TaskTracker窗口中没有看到任何显示。以下是“从job tracker转储”窗口:
16/07/14 12:05:24 ERROR mapred.EagerTaskInitializationListener: Job initialization failed:
java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2367)
at java.util.regex.Pattern.atom(Pattern.java:2164)
at java.util.regex.Pattern.sequence(Pattern.java:2097)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:803)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:360)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:55)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "initJobs" java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2367)
at java.util.regex.Pattern.atom(Pattern.java:2164)
at java.util.regex.Pattern.sequence(Pattern.java:2097)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1549)
at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2320)
at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2004)
at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2019)
at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2095)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
at java.lang.Thread.run(Thread.java:745)
有人能帮忙理解这个问题吗?
暂无答案!
目前还没有任何答案,快来回答吧!