如何在windows10的cmdshell上本地运行mapreduce程序

w1e3prcc 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(322)

我试图在笔记本电脑上运行本地mapreduce程序，安装hadoop2.8版本。我不知道如何在cmdshell中使用下面的命令。
这是我的命令，也是共享的Map器和减速机代码。以及我的csv文件中的数据。

D:\hadoop\bin\hadoop jar D:\hadoop\share\hadoop\tools\lib\hadoop-streaming-2.3.0.jar 
-D mapred.reduce.tasks=0
-file /reducer.py -mapper "mapper.py" 
-input /data2.csv -input /data2.csv 
-output /output


# !/usr/bin/python3

# mapper.py

import sys

# input comes from STDIN (standard input)

for line in sys.stdin:
    line = line.strip()
    line = line.split(",")

    if len(line) >=2:
        sex = line[1]
        age = line[2]
        print ('%s\t%s' % (sex, age))


# !/usr/bin/python3

# Reducer.py

import sys

sex_age = {}

# Partitoner

for line in sys.stdin:
    line = line.strip()
    sex, age = line.split('\t')

    if sex in sex_age:
        sex_age[sex].append(int(age))
    else:
        sex_age[sex] = []
        sex_age[sex].append(int(age))

# Reducer

for sex in sex_age.keys():
    ave_age = sum(sex_age[sex])*1.0 / len(sex_age[sex])
    print ('%s\t%s'% (sex, ave_age))

hadoop mapreduce python hadoop-streaming

来源：https://stackoverflow.com/questions/59750303/how-to-run-mapreduce-program-locally-on-laptop-on-cmd-shell-in-windows-10