我知道这是个奇怪的问题。让我举个例子。我正在编写一个reducer函数,它将 Iterator
它收到的。迭代器中的字符串的格式为“%s,%s,%s”。当我这样写代码时:
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
StringBuilder indexValue = new StringBuilder();
while (values.hasNext()) {
String data = values.next().toString();
indexValue.append(data);
}
output.collect(key, new Text(indexValue.toString()));
}
我得到的输出似乎是正确的。格式为“%s、%s、%s%s、%s、%s…”
但是,当我这样编写代码时:
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
StringBuilder indexValue = new StringBuilder();
while (values.hasNext()) {
String data = values.next().toString();
String [] parts = data.split(",");
indexValue.append(parts[0] + "," + parts[1] + "," + parts[2]);
}
output.collect(key, new Text(indexValue.toString()));
}
我得到一个完全不同的,奇怪的输出。首先,输出并不包含所有本应串联的值。其次,它的形式对我来说毫无意义。它看起来像“%s,%s,%s%s”。很明显那里有一些信息缺失。
你知道是什么引起的吗?我完全被难住了。
编辑:我被要求提供原始数据,在这里。我还将在下面提供mapper函数。
Map器函数:
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
String [] parts = line.split("\t");
int frequency = Integer.parseInt(parts[1]);
String [] documentDataParts = parts[0].split(",");
String term = documentDataParts[0];
String bookFilename = documentDataParts[1];
String chunk = documentDataParts[2];
String documentData = bookFilename + "," + chunk + "," + frequency;
output.collect(new Text(term), new Text(documentData));
}
数据样本:
Ages,LesMiserablesbyVictorHugo.txt,5545 1
Aggeus,LeviathanbyThomasHobbes.txt,1268 1
Aggravateth,LeviathanbyThomasHobbes.txt,995 1
Aggravateth,LeviathanbyThomasHobbes.txt,999 1
Aggravation,LeviathanbyThomasHobbes.txt,1015 1
Aggravation,LeviathanbyThomasHobbes.txt,1691 1
Aggregate,LeviathanbyThomasHobbes.txt,1293 1
Agier,LesMiserablesbyVictorHugo.txt,2790 1
Agincourt,LesMiserablesbyVictorHugo.txt,1510 1
Agn,LesMiserablesbyVictorHugo.txt,5114 1
Agnes,LesMiserablesbyVictorHugo.txt,6450 1
Agnese,LesMiserablesbyVictorHugo.txt,580 1
Agnus,UlyssesbyJamesJoyce.txt,1827 1
暂无答案!
目前还没有任何答案,快来回答吧!