hadoopMap器读取多行

qmb5sa22 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(357)

hadoop新手-例如，我尝试以块的形式读取hdfs文件-一次读取100行，然后在Map器中使用ApacheOlsMultipleLinearRegression对数据运行回归。我使用此处显示的代码阅读多行代码：http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
我的Map器定义为：

public void map(LongWritable key, Text value,Context context) throws java.io.IOException ,InterruptedException
{
    String lines = value.toString();
    String []lineArr = lines.split("\n");
    int lcount = lineArr.length;
    System.out.println(lcount); // prints out "1"
    context.write(new Text(new Integer(lcount).toString()),new IntWritable(1));
}

我的问题是：为什么lcount==1来自system.out.println？我的文件由“\n”分隔，并且我在记录读取器中设置了nlinestoprocess=3。我的输入文件格式为：

y x1 x2 x3 x4 x5
y x1 x2 x3 x4 x5
y x1 x2 x3 x4 x5
...

我不能执行我的多元回归，如果我一次只读一行，因为回归api需要在多个数据点。。。谢谢你的帮助

Java hadoop Input Mapper

来源：https://stackoverflow.com/questions/14669282/hadoop-mapper-reading-multiple-lines

1条答案

按热度按时间

vohkndzv1#

String.split() 以正则表达式作为参数。你得加倍逃跑。

String []lineArr = lines.split("\\n");

赞(0）回复(0）举报 2021-06-04

我来回答

hadoopMap器读取多行

1条答案

相关问题

热门标签

最新问答