hadoop流媒体无法运行python

efzxgjgh 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(426)

我试图用python代码执行hadoop streaming with mapreduce，但是它总是给出相同的错误结果， File: file:/C:/py-hadoop/map.py is not readable 或 Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 我使用hadoop3.1.1和python3.8以及windows10操作系统
这是我的map reduce命令行

hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py,C:/py-hadoop/reduce.py -mapper "python map.py" -reducer "python reduce.py" -input /user/input -output /user/python-output

Map.py

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print ("%s\t%s" % (word, 1))

减少.py

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None
clean = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
        continue
    word = filter(lambda x: x in clean, word).lower()
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print ("%s\t%s" % (current_word, current_count))
        current_count = count
        current_word = word

if current_word == word:
    print ("%s\t%s" % (current_word, current_count))

也尝试过不同的命令行，比如

hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py -mapper "python map.py" -file C:/py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output

和

hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file py-hadoop/map.py -mapper "python map.py" -file py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output

但仍然给出了完全相同的错误结果，
如果我的英语不好，我很抱歉，我不是以英语为母语的人

hadoop mapreduce python python-3.x hadoop-streaming

来源：https://stackoverflow.com/questions/65062463/hadoop-streaming-cant-run-python

1条答案

按热度按时间

xkrw2x1b1#

已经修复了，问题是reduce.py引起的，这是我的新reduce.py

import sys
import collections

counter = collections.Counter()

for line in sys.stdin:
    word, count = line.strip().split("\t", 1)

    counter[word] += int(count)

for x in counter.most_common(9999):
    print(x[0],"\t",x[1])

这是我以前运行的命令行

hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py -file C:/py-hadoop/reduce.py -mapper "python map.py" -reducer "python reduce.py" -input /user/input -output /user/python-output

赞(0）回复(0）举报 2021-05-27

我来回答

hadoop流媒体无法运行python

1条答案

相关问题

热门标签

最新问答