我试图用python代码执行hadoop streaming with mapreduce,但是它总是给出相同的错误结果, File: file:/C:/py-hadoop/map.py is not readable
或 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
我使用hadoop3.1.1和python3.8以及windows10操作系统
这是我的map reduce命令行
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py,C:/py-hadoop/reduce.py -mapper "python map.py" -reducer "python reduce.py" -input /user/input -output /user/python-output
Map.py
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print ("%s\t%s" % (word, 1))
减少.py
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
clean = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
word = filter(lambda x: x in clean, word).lower()
if current_word == word:
current_count += count
else:
if current_word:
print ("%s\t%s" % (current_word, current_count))
current_count = count
current_word = word
if current_word == word:
print ("%s\t%s" % (current_word, current_count))
也尝试过不同的命令行,比如
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py -mapper "python map.py" -file C:/py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output
和
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file py-hadoop/map.py -mapper "python map.py" -file py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output
但仍然给出了完全相同的错误结果,
如果我的英语不好,我很抱歉,我不是以英语为母语的人
1条答案
按热度按时间xkrw2x1b1#
已经修复了,问题是reduce.py引起的,这是我的新reduce.py
这是我以前运行的命令行