java—在hadoop中拆分字符串，但只得到数组的第一个索引

42fyovps 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(206)

我在hadoop中遇到了一个关于split方法的问题。非常简单的代码如下：

public static class Reduce extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        for (Text each : values) {
            String[] array = each.toString().split(",");
            context.write(key, new Text(array[1]));
        }
    }
}

值是这样的字符串：“a，0，0，1”。我想做的是尝试拆分这个字符串并将它们放入一个数组中。如果我执行上面的代码，它将显示arrayoutofindexexception。但是我可以访问索引0的数组，它将返回一个“a”。我想弄清楚，然后把“array[1]”部分改为“integer.tostring（array.length）”。结果它返回1。我很困惑。我很确定输入数据是正确的，没有空值，没有更多的空格。
当我把代码改为：

public static class Reduce extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        for (Text each : values) {
            for (String eachPart: each.toString().split(",")){
                context.write(key, new Text(eachPart));
            }
        }
    }
}

它可以返回输入的每一个值，结果如下（假设键为“0,0”）：

但当我再次尝试把它放到这样的数组中时：

public static class Reduce extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        for (Text each : values) {
            int i = 0;
            String[] tmp = new String[4];
            for (String eachPart: each.toString().split(",")){
                tmp[i] = eachPart;
                i++;
                context.write(key, new Text(tmp[1]));
            }
        }
    }
}

我无法获取数组，它将是nullpointerexception。如果我把“tmp[1]”改为“tmp[0]”，就可以了，它会返回一个“a”。我很困惑。有人有主意吗？提前谢谢。

Java hadoop reduce split

来源：https://stackoverflow.com/questions/45936753/split-a-string-in-hadoop-but-only-got-the-first-index-of-array