无法在windows cygwin上执行hadoop map reduce代码

daolsyd0  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(227)

我试图理解hadoop框架及其map reduce功能。
我使用的环境:Windows7和运行Hadoop0.19.1的cygwin。EclipseEuropa for MapReduceJobs开发。
面临的问题:
使用了默认Map器和reducer标识类的示例代码。目标是将输入文件夹中的文件复制到输出文件夹,而不需要对数据进行任何处理

testdriver.java文件包含:

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class TestDriver {

    public static void main(String[] args) {

        JobClient client = new JobClient();
        JobConf conf = new JobConf(TestDriver.class);

        // TODO: specify output types
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        // TODO: specify input and output DIRECTORIES (not files)
        //conf.setInputPath(new Path("src"));
        //conf.setOutputPath(new Path("out"));
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path("In"));
        FileOutputFormat.setOutputPath(conf, new Path("Out"));

        // TODO: specify a mapper
        conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);

        // TODO: specify a reducer
        conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

        client.setConf(conf);
        try {
            JobClient.runJob(conf);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

hadoop集群是使用cygwin终端启动的。倾倒 jps 命令:

$ jps
4944 NameNode
6588 SecondaryNameNode
8504 TaskTracker
8640 JobTracker
8340 DataNode
8568 Jps

hadoop-site.xml文件包含以下内容:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9100</value>

</property>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9101</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property> 
</configuration>

我的hadoop安装路径中没有通过cygwin的yarn-site.xml文件,它是必需的吗?这会导致任何问题吗?
在eclipse中,使用map/reduce主端口9101和dfs主端口9100创建了一个hadoop map/reduce位置。运行程序时,控制台上出现以下数据:

16/07/14 12:06:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/07/14 12:06:35 INFO mapred.FileInputFormat: Total input paths to process : 4
16/07/14 12:06:37 INFO mapred.JobClient: Running job: job_201607141149_0002
16/07/14 12:06:38 INFO mapred.JobClient:  map 0% reduce 0%

我在cygwin的TaskTracker窗口中没有看到任何显示。以下是“从job tracker转储”窗口:

16/07/14 12:05:24 ERROR mapred.EagerTaskInitializationListener: Job initialization failed:
java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
                                                ^
        at java.util.regex.Pattern.error(Pattern.java:1924)
        at java.util.regex.Pattern.escape(Pattern.java:2367)
        at java.util.regex.Pattern.atom(Pattern.java:2164)
        at java.util.regex.Pattern.sequence(Pattern.java:2097)
        at java.util.regex.Pattern.expr(Pattern.java:1964)
        at java.util.regex.Pattern.compile(Pattern.java:1665)
        at java.util.regex.Pattern.<init>(Pattern.java:1337)
        at java.util.regex.Pattern.compile(Pattern.java:1022)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:803)
        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:360)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:55)
        at java.lang.Thread.run(Thread.java:745)

Exception in thread "initJobs" java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
                                                ^
        at java.util.regex.Pattern.error(Pattern.java:1924)
        at java.util.regex.Pattern.escape(Pattern.java:2367)
        at java.util.regex.Pattern.atom(Pattern.java:2164)
        at java.util.regex.Pattern.sequence(Pattern.java:2097)
        at java.util.regex.Pattern.expr(Pattern.java:1964)
        at java.util.regex.Pattern.compile(Pattern.java:1665)
        at java.util.regex.Pattern.<init>(Pattern.java:1337)
        at java.util.regex.Pattern.compile(Pattern.java:1022)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
        at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1549)
        at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2320)
        at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2004)
        at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2019)
        at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2095)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
        at java.lang.Thread.run(Thread.java:745)

有人能帮忙理解这个问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题