我不明白为什么标准python代码在使用mrjob转换为mapreduce时会产生意外的结果。
来自.txt文件的示例数据:
1 12
1 14
1 15
1 16
1 18
1 12
2 11
2 11
2 13
3 12
3 15
3 11
3 10
此代码创建字典并执行简单的除法计算:
dic = {}
with open('numbers.txt', 'r') as fi:
for line in fi:
parts = line.split()
dic.setdefault(parts[0],[]).append(int(parts[1]))
print(dic)
for k, v in dic.items():
print (k, 1/len(v), v)
结果:
{'1': [12, 14, 15, 16, 18, 12], '2': [11, 11, 13], '3': [12, 15, 11, 10]}
1 0.16666666666666666 [12, 14, 15, 16, 18, 12]
2 0.3333333333333333 [11, 11, 13]
3 0.25 [12, 15, 11, 10]
但当使用mrjob转换为mapreduce时:
from mrjob.job import MRJob
from mrjob.step import MRStep
from collections import defaultdict
class test(MRJob):
def steps(self):
return [MRStep(mapper=self.divided_vals)]
def divided_vals(self, _, line):
dic = {}
parts = line.split()
dic.setdefault(parts[0],[]).append(int(parts[1]))
for k, v in dic.items():
yield (k, 1/len(v)), v
if __name__ == '__main__':
test.run()
结果:
["2", 1.0] [11]
["2", 1.0] [13]
["3", 1.0] [12]
["3", 1.0] [15]
["3", 1.0] [11]
["3", 1.0] [10]
["1", 1.0] [12]
["1", 1.0] [14]
["1", 1.0] [15]
["1", 1.0] [16]
["1", 1.0] [18]
["1", 1.0] [12]
["2", 1.0] [11]
为什么mapreduce不以相同的方式分组和计算?如何在mapreduce中重新创建标准python结果?
暂无答案!
目前还没有任何答案,快来回答吧!