在mapreduce中重新创建python字典结果？

niknxzdl 于 2021-06-01 发布在 Hadoop

关注(0)|答案(0)|浏览(259)

我不明白为什么标准python代码在使用mrjob转换为mapreduce时会产生意外的结果。
来自.txt文件的示例数据：

此代码创建字典并执行简单的除法计算：

dic = {}

with open('numbers.txt', 'r') as fi:
    for line in fi:
        parts = line.split()
        dic.setdefault(parts[0],[]).append(int(parts[1]))

print(dic)

for k, v in dic.items():
    print (k, 1/len(v), v)

结果：

{'1': [12, 14, 15, 16, 18, 12], '2': [11, 11, 13], '3': [12, 15, 11, 10]}

1 0.16666666666666666 [12, 14, 15, 16, 18, 12]
2 0.3333333333333333 [11, 11, 13]
3 0.25 [12, 15, 11, 10]

但当使用mrjob转换为mapreduce时：

from mrjob.job import MRJob
from mrjob.step import MRStep
from collections import defaultdict

class test(MRJob):

    def steps(self):
        return [MRStep(mapper=self.divided_vals)]

    def divided_vals(self, _, line):

        dic = {}
        parts = line.split() 
        dic.setdefault(parts[0],[]).append(int(parts[1]))

        for k, v in dic.items():
            yield (k, 1/len(v)), v 

if __name__ == '__main__': 
    test.run()

结果：

["2", 1.0]  [11]
["2", 1.0]  [13]
["3", 1.0]  [12]
["3", 1.0]  [15]
["3", 1.0]  [11]
["3", 1.0]  [10]
["1", 1.0]  [12]
["1", 1.0]  [14]
["1", 1.0]  [15]
["1", 1.0]  [16]
["1", 1.0]  [18]
["1", 1.0]  [12]
["2", 1.0]  [11]

为什么mapreduce不以相同的方式分组和计算？如何在mapreduce中重新创建标准python结果？

hadoop mapreduce python mrjob

来源：https://stackoverflow.com/questions/47623775/recreate-python-dictionary-results-in-mapreduce

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

在mapreduce中重新创建python字典结果？

暂无答案！

相关问题

热门标签

最新问答