比较python中给定给键的多个值

fafcakar 于 2021-06-02 发布在 Hadoop

关注(0)|答案(3)|浏览(310)

实际上，我正在尝试使用python为特定数据开发一个Map器和缩减器。我已经编写了Map器代码，它将给出商店名称和在商店完成的事务的成本。
例如： Nike $45.99 Adidas $72.99 Puma $56.99 Nike $109.99 Adidas $85.99 这里的键是商店名称，值是交易成本。现在我正在尝试编写reducer代码，它将比较每个商店的事务成本，并给出每个商店的最高事务。
现在我想要得到的输出是 Nike $109.99 Adidas $85.99 Puma $56.99 我的问题是如何比较python中给定给键的不同值？

hadoop python key-value

来源：https://stackoverflow.com/questions/37782885/comparing-multiple-values-given-to-a-key-in-python

3条答案

按热度按时间

von4xj4u1#

嗯，那个 MapReduce 范例是一个键值对，每个Map器应该以精确的格式输出。
至于reducer，hadoop框架保证每个使用shuffle排序算法的reducer将获得某个键的所有值，因此两个不同的reducer不可能从同一个键获得不同的条目。
但是，一个reducer可以有多个键值要处理。
对于您的问题，假设同一个键有3个不同的值，例如：

Nike $109.99
Nike $45.99
Nike $294.99

reducer将首先获得2个值，因此基于键的reducer函数将获得以下值：
$109.99 $45.99 并且需要使用简单的比较输出最高的一个，并且输出应该是 $109.99 这将是您的减速机功能第二次运行时的输入，这次输入：
$109.99 $294.99 同样，使用比较，您应该输出最高值，即： $294.99 至于代码，您需要一个非常简单的函数，例如：
编辑：我假设您的分隔符是tab，但是您可以将格式更改为您正在使用的任何格式


# !/usr/bin/env python

import sys

current_word = None
current_max_count = 0
word = None

# input comes from STDIN

for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()

    # parse the input we got from mapper.py
    word, count = line.split('\t', 1)

    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue

    # this IF-switch only works because Hadoop sorts map output
    # by key (here: word) before it is passed to the reducer
    if current_word == word:
        if count > current_max_count:
            current_max_count = count
    else:
        if current_word:
            # write result to STDOUT
            print '%s\t%s' % (current_word, current_max_count)
        current_max_count = count
        current_word = word

# do not forget to output the last word if needed!

if current_word == word:
    print '%s\t%s' % (current_word, current_max_count)

赞(0）回复(0）举报 2021-06-02

6fe3ivhb2#

def largets_value(_dict):
    d = {}
    for i, v in enumerate(_dict.keys()):
        d[v] = max(_dict.values()[i])
    return d

def dict_from_txt(file, sep):
    d = {}
    f = [x.rstrip().replace('$', '').split(sep) for x in open(file, 'rb').readlines()]
    for i in f:
        if i[0] in d:
            d[i[0]].append(float(i[1]))
        else:
            d[i[0]] = [float(i[1])]
    return d

def dict_from_iterable(iterable, sep):
    d = {}
    f = [x.rstrip().replace('$', '').split(sep) for x in iterable]
    for i in f:
        if i[0] in d:
            d[i[0]].append(float(i[1]))
        else:
            d[i[0]] = [float(i[1])]
    return d

data = ['Nike $45.99',
        'Adidas $72.99',
        'Puma $56.99',
        'Nike $109.99',
        'Adidas $85.99']
print largets_value(dict_from_iterable(data, ' '))

# Uncomment next line and delete the previous to use for yourself

# print largets_value(dict_from_txt('my_file', ' '))

赞(0）回复(0）举报 2021-06-02

ryhaxcpt3#

hadoop应该在将Map器的输出传递给reducer之前对其进行排序。考虑到你可以使用 itertools.groupby() 要将相似的键分组到列表中，然后从每个分组的列表中选择最大的键，请执行以下操作：


# !/usr/bin/env python

import sys
from itertools import groupby

for store, transactions in groupby((line.split() for line in sys.stdin),
                                   key=lambda line: line[0]):
    print(store, max(float(amount[1].replace('$', '')) for amount in transactions))

当然，这假设Map器的输出由两个空格分隔的字段组成，分别用于存储和事务值。

赞(0）回复(0）举报 2021-06-02

我来回答

比较python中给定给键的多个值

3条答案

相关问题

热门标签

最新问答