java—为什么在拆分字符串并将其重新连接在一起之后,我的reducer函数会得到不同的输出?

ht4b089n  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(147)

我知道这是个奇怪的问题。让我举个例子。我正在编写一个reducer函数,它将 Iterator 它收到的。迭代器中的字符串的格式为“%s,%s,%s”。当我这样写代码时:

public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
        StringBuilder indexValue = new StringBuilder();
        while (values.hasNext()) {
            String data = values.next().toString();
            indexValue.append(data);
        }

        output.collect(key, new Text(indexValue.toString()));
}

我得到的输出似乎是正确的。格式为“%s、%s、%s%s、%s、%s…”
但是,当我这样编写代码时:

public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
        StringBuilder indexValue = new StringBuilder();
        while (values.hasNext()) {
            String data = values.next().toString();
            String [] parts = data.split(",");
            indexValue.append(parts[0] + "," + parts[1] + "," + parts[2]);
        }

        output.collect(key, new Text(indexValue.toString()));
}

我得到一个完全不同的,奇怪的输出。首先,输出并不包含所有本应串联的值。其次,它的形式对我来说毫无意义。它看起来像“%s,%s,%s%s”。很明显那里有一些信息缺失。
你知道是什么引起的吗?我完全被难住了。
编辑:我被要求提供原始数据,在这里。我还将在下面提供mapper函数。
Map器函数:

public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
        String line = value.toString();
        String [] parts = line.split("\t");

        int frequency = Integer.parseInt(parts[1]);
        String [] documentDataParts = parts[0].split(",");

        String term = documentDataParts[0];
        String bookFilename = documentDataParts[1];
        String chunk = documentDataParts[2];

        String documentData = bookFilename + "," + chunk + "," + frequency;
        output.collect(new Text(term), new Text(documentData));
}

数据样本:

Ages,LesMiserablesbyVictorHugo.txt,5545 1
Aggeus,LeviathanbyThomasHobbes.txt,1268 1
Aggravateth,LeviathanbyThomasHobbes.txt,995     1
Aggravateth,LeviathanbyThomasHobbes.txt,999     1
Aggravation,LeviathanbyThomasHobbes.txt,1015    1
Aggravation,LeviathanbyThomasHobbes.txt,1691    1
Aggregate,LeviathanbyThomasHobbes.txt,1293      1
Agier,LesMiserablesbyVictorHugo.txt,2790        1
Agincourt,LesMiserablesbyVictorHugo.txt,1510    1
Agn,LesMiserablesbyVictorHugo.txt,5114  1
Agnes,LesMiserablesbyVictorHugo.txt,6450        1
Agnese,LesMiserablesbyVictorHugo.txt,580        1
Agnus,UlyssesbyJamesJoyce.txt,1827      1

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题