hadoop字数计算示例-空指针异常

tzdcorbm  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(260)

我是hadoop初学者。我的设置:rhel7,hadoop-2.7.3
我正在尝试运行以下示例:\u wordcount\u v2.0。我刚刚将源代码复制到新的eclipse项目中,并将其导出到wc.jar文件中。
现在,我已经按照链接中的说明配置了hadoop伪分布式操作。然后我从以下几点开始:
正在输入目录中创建输入文件:

echo "Hello World, Bye World!" > input/file01
echo "Hello Hadoop, Goodbye to hadoop." > input/file02

启动环境:

sbin/start-dfs.sh
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/<username>
bin/hdfs dfs -put input input
bin/hadoop jar ws.jar WordCount2 input output

这就是我得到的:

16/09/02 13:15:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/09/02 13:15:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/09/02 13:15:01 INFO input.FileInputFormat: Total input paths to process : 2
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: number of splits:2
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local455553963_0001
16/09/02 13:15:01 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/09/02 13:15:01 INFO mapreduce.Job: Running job: job_local455553963_0001
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/09/02 13:15:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Waiting for map tasks
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000000_0
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file02:0+33
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000001_0
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file01:0+24
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
16/09/02 13:15:02 INFO mapred.LocalJobRunner: map task executor complete.
16/09/02 13:15:02 WARN mapred.LocalJobRunner: job_local455553963_0001
java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
    at WordCount2$TokenizerMapper.setup(WordCount2.java:47)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 running in uber mode : false
16/09/02 13:15:02 INFO mapreduce.Job:  map 0% reduce 0%
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 failed with state FAILED due to: NA
16/09/02 13:15:02 INFO mapreduce.Job: Counters: 0

未给出结果(输出)。为什么我会得到那个例外?
谢谢
编辑:
感谢建议的解决方案,我意识到还有第二次尝试(在wordcount示例中):

echo "\." > patterns.txt
echo "\," >> patterns.txt
echo "\!" >> patterns.txt
echo "to" >> patterns.txt

然后运行:

bin/hadoop jar ws.jar WordCount2 -Dwordcount.case.sensitive=true input output -skip patterns.txt

一切都太好了!

r8uurelv

r8uurelv1#

问题可能出在代码的这一部分:

caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
 if (conf.getBoolean("wordcount.skip.patterns", true)) {
     URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
     for (URI patternsURI : patternsURIs) {
         Path patternsPath = new Path(patternsURI.getPath());
         String patternsFileName = patternsPath.getName().toString();
         parseSkipFile(patternsFileName);
     }
 }

在这里 getCacheFiles() 他回来了 null 不管什么原因。这就是为什么当你试图迭代 patternsURIs (除了 null ),你得到了例外。
要解决此问题,请在启动循环之前检查 patternsURIs 是否为空。

if(patternsURIs != null) {
    for (URI patternsURI : patternsURIs) {  
      Path patternsPath = new Path(patternsURI.getPath());
      String patternsFileName = patternsPath.getName().toString();
      parseSkipFile(patternsFileName);
    }
}

你还应该检查一下为什么 null ,如果不希望 null .

xa9qqrwz

xa9qqrwz2#

问题发生在 setup() Map器的方法。这个wordcount示例比通常的例子要高级一些,它允许您指定一个包含Map器将过滤掉的模式的文件。此文件将添加到 main() 方法,使其在每个节点上都可供Map程序打开。
您可以在中看到正在添加到缓存的文件 main() :

for (int i=0; i < remainingArgs.length; ++i) {
    if ("-skip".equals(remainingArgs[i])) {
        job.addCacheFile(new Path(remainingArgs[++i]).toUri());
        job.getConfiguration().setBoolean("wordcount.skip.patterns", true);
    } else {
        otherArgs.add(remainingArgs[i]);
    }
}

您没有指定 -skip 选项,这样它就不会尝试添加任何内容。如果添加了一个文件,您可以看到该文件集 wordcount.skip.patternstrue .
在Map上 setup() 您有以下代码:

@Override
 public void setup(Context context) throws IOException, InterruptedException {
     conf = context.getConfiguration();
     caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
     if (conf.getBoolean("wordcount.skip.patterns", true)) {
         URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
         for (URI patternsURI : patternsURIs) {
             Path patternsPath = new Path(patternsURI.getPath());
             String patternsFileName = patternsPath.getName().toString();
             parseSkipFile(patternsFileName);
         }
     }
}

问题是这张支票 conf.getBoolean("wordcount.skip.patterns", true) 默认为 true 如果它没有被设定,在你的情况下它不会被设定。因此 patternsURIs 或者周围的东西(我没有行号)将是空的。
所以你要么改变 wordcount.case.sensitive 默认为 false ,设置为 false 在驱动程序中(main方法)或提供一个跳过文件来修复它。

相关问题