Map器和reducer函数的输出到底是什么

9fkzdhlc 于 2021-05-29 发布在 Hadoop

关注(0)|答案(3)|浏览(299)

这是使用mapreduce和hadoop提取包含特定值的行的后续问题
Map器函数

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

减速机功能

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

获得的输出为

编辑1:但预期的输出是

40 102  
40 104  
40 105

我做错什么了？
在mapper和reducer函数中到底发生了什么？

hadoop mapreduce hadoop2 feature-extraction Mapper

来源：https://stackoverflow.com/questions/37093158/what-exactly-is-output-of-mapper-and-reducer-function

3条答案

按热度按时间

wgeznvg71#

到底发生了什么
您正在使用逗号分隔的文本行，拆分逗号，并过滤掉一些值。 con.write() 如果您所做的只是提取这些值，则每行只应调用一次。
Map器将对您输出的所有“40”键进行分组，并形成一个用该键编写的所有值的列表。这就是减速机的读数。
你应该试试这个Map功能。

// Set the values to write 
saleValue.set(Integer.parseInt(words[0]));
rangeValue.set(words[3]);

// Filter out only the 40s
if(words[3].equals("40")) {
    // Write out "(40, safeValue)" words.length times 
    for(String word: words )
    {
        con.write( rangeValue , saleValue );
    }
}

如果不希望分割字符串的长度有重复的值，那么就去掉for循环。
你的减缩器所做的只是打印出它从Map器收到的信息。

赞(0）回复(0）举报 2021-05-30

pgky5nke2#

在原始问题的上下文中-在复制条目时，不需要Map器或缩减器中的循环：

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

在reducer中，正如@serhiy在最初的问题中建议的那样，您只需要一行代码：

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
}

重读“编辑1”-我会留下一个琐碎的做法：）

赞(0）回复(0）举报 2021-05-30

brgchamk3#

Map器输出如下：

<word,count>

减速器输出如下：

<unique word, its total count>

例句：读一行字，把其中所有的字都数出来，放进一张纸里 <key,value> 一对：

<40,1>
<140,1>
<50,1>
<40,1> ..

这里是40，50140。。是所有键，该值是该键在一行中出现的次数。这在Map器中发生。
那么，这些 key,value 成对的密钥被发送到reducer，在那里相似的密钥都被缩减为一个 key 与该键相关联的所有值相加，为该键-值对提供一个值。减速机的结果是：

<40,10>
<50,5>
...

在你的情况下，减速机没有任何作用。Map器找到的唯一值/单词仅作为输出。
理想情况下，你应该减少并得到这样的输出：“40150”被发现5次在同一行。

赞(0）回复(0）举报 2021-05-29

我来回答

Map器和reducer函数的输出到底是什么

3条答案

相关问题

热门标签

最新问答