对空文件执行shuffle失败eofexception:输入流意外结束

iqxoj9l9  于 2021-07-15  发布在  Hadoop
关注(0)|答案(1)|浏览(258)

我正在尝试运行数据处理管道的副本,它在集群上正常工作,在本地机器上,hadoop和hbase在独立模式下工作。管道包含几个mapreduce作业,它们一个接一个地启动,其中一个作业的mapper不在输出中写入任何内容(取决于输入,但它不在我的测试中写入任何内容),但有reducer。我在此作业运行期间收到此异常:

16:42:19,322 [INFO] [localfetcher#13] o.a.h.i.c.CodecPool: Got brand-new decompressor [.gz] 
16:42:19,322 [INFO] [localfetcher#13] o.a.h.m.t.r.LocalFetcher: localfetcher#13 about to shuffle output of map attempt_local509755465_0013_m_000000_0 decomp: 2 len: 6 to MEMORY
16:42:19,326 [WARN] [Thread-4749] o.a.h.m.LocalJobRunner: job_local509755465_0013 java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
  at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.5.1.jar:?]
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) ~[?:1.8.0_181]
  at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:1.8.0_181]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: java.io.EOFException: Unexpected end of input stream
  at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:157) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

我检查了mapper生成的文件,我希望它们是空的,因为mapper没有写入任何要存储的内容,但它们包含奇怪的文本:
文件:/tmp/hadoop egorkiruhin/mapred/local/localrunner/egorkiruhin/jobcache/job\u local509755465\u 0013/attempt\u local509755465\u 0013\u m\u000000\u 0/output/file.out
ÿÿÿÿ^@^@
文件:/tmp/hadoop egorkiruhin/mapred/local/localrunner/egorkiruhin/jobcache/job\u local509755465\u 0013/attempt\u local509755465\u 0013\u m\u000000\u 0/output/file.out.index
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^b ^@^@^@^@^@^@^@^@^@^ f ^@^@^@^@dtg<93>

2vuwiymt

2vuwiymt1#

我找不到这个问题的解释,但我通过关闭Map器输出的压缩来解决它:

config.set("mapreduce.map.output.compress", "false");

相关问题