MapReducePython

ifsvaxew 于 2022-09-20 发布在 MapReduce

关注(0)|答案(1)|浏览(121)

我对Python和MapReduce完全陌生。如果有人能帮助我实现以下结果，那就太好了。我想从如下列表中计算关键字的计数和每个关键字的平均值。该对中的第一个数字是密钥，第二个是值。

1，5
1，5
2，7
2，8
1，10
2，10
3，3
1，20

输出将如下所示。

1、4、10
2、3、8.3
3、1、3

谢谢

hadoop

来源：https://stackoverflow.com/questions/73259921/mapreduce-python

1条答案

按热度按时间

bkkx9g8r1#

我建议您使用迭代式工具而不是REDUTE。

import itertools
import functools
import statistics

data = [[1,5], [1,5], [2,7], [2,8], [1,10], [2,10], [3,3], [1,20]]

# First, sort and group the input by key

sorted_data = sorted(data, key=lambda x: x[0])
grouped = itertools.groupby(sorted_data, lambda e: e[0])

# This will result in a structure like this:

# [

# (1, [[1, 5], [1, 5], [1, 10], [1, 20]]),

# (2, [[2, 7], [2, 8], [2, 10]]),

# (3, [[3, 3]])

# ]

# Remove the duplicate keys from the structure

remove_duplicate_keys = map(lambda x: (x[0], [e[1] for e in x[1]]), grouped)

# This will produce the following structure:

# [

# (1, [5, 5, 10, 20]),

# (2, [7, 8, 10]),

# (3, [3])

# ]

# Now, calculate count and mean for each entry

result = map(lambda x: (x[0], len(x[1]), statistics.mean(x[1])), remove_dublicate_keys)

# This will result in the following list:

# [(1, 4, 10), (2, 3, 8.333333333333334), (3, 1, 3)]

注：所有指令都将返回生成器。这意味着，在您开始使用它之前，Python不会计算任何东西。但您只能访问元素一次。如果您需要它们出现在常规列表中，或者需要多次访问这些信息，请将最后一行替换为：

result = list(map(lambda x: (x[0], len(x[1]), statistics.mean(x[1])), remove_dublicate_keys))

这会将原始生成器链转换为常规列表。

赞(0）回复(0）举报 2022-09-20

我来回答

MapReducePython

1条答案

相关问题

热门标签

最新问答