我正在尝试编写map reduce代码,其中mapper接收带有逗号分隔的记录值的文件输入,我正在基于逗号拆分行,并为每行创建一个集合,最后将每个集合添加到列表集合中。
我不确定如何将其写入上下文并传递给reducer,或者更确切地说,mapreduce中的java列表集合使用什么数据类型。我听说过数组是可写的,但不能在代码中实现它。请看下面的代码:
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>
{
List<HashSet> listA=new CopyOnWriteArrayList<HashSet>();
Set<Set> finalset=new HashSet();
Set<String> hs3;
public void map(LongWritable key, Text value,Context context) throws IOException,InterruptedException
{
String line = value.toString();
String[] str=line.split(",");
HashSet<String> hs=new HashSet<>();
for(int i=0;i<str.length;i++)
{
hs.add(str[i]);
}
listA.add(hs);
context.write(key, listtobepassed); //listA in this case
}
}
输入格式如下:;
N3H1P7RBS4QM_71,N3H1P7RBS4QM_72,N3H1P7RBS4QM_73
N3H1P7RBS4QM_143,N3H1P7RBS4QM_144,N3H1P7RBS4QM_167046,N3H1P7RBS4QM_328681,N3H1P7RBS4QM_328682
N3H1P7RBS4QM_145,N3H1P7RBS4QM_167047,N3H1P7RBS4QM_328683
N3H1P7RBS4QM_193,N3H1P7RBS4QM_194,N3H1P7RBS4QM_167088,N3H1P7RBS4QM_167089
更新问题:我试图通过map reduce对输入运行一个传递闭包程序;示例:输入
Ref1, Ref2, Ref3
Ref3
Ref4, Ref5, Ref6
Ref7, Ref8, Ref9
Ref9
Ref10, Ref11
Ref11, Ref12
输出应如下所示:
Ref1, Ref2, Ref3,Ref4,Ref5,Ref6
Ref7, Ref8, Ref9
Ref10, Ref11, Ref12
公共输入值合并在一起。
我的java代码看起来像 public class TestList {
```
public static void main(String[] args) throws IOException
{
List ls=new CopyOnWriteArrayList();
BufferedReader br=new BufferedReader(new FileReader("/inputfilepath"));
String Data;
while((Data=br.readLine())!=null)
{
String[] DataLine=Data.split(",");
HashSet<String> hs =new HashSet();
for(int i=0;i<DataLine.length;i++)
{
hs.add(DataLine[i]);
}
ls.add(hs);
}
br.close();
//System.out.println(ls.iterator().next());
Iterator<HashSet> itr=ls.iterator();
HashSet<String> hs2=null;
while(itr.hasNext())
{
HashSet<String> ele=itr.next();
for (HashSet<String> hs1 : ls)
{
if(!Collections.disjoint(hs1, ele))
{
hs2=new HashSet<String>(hs1);
hs2.addAll(ele);
ls.remove(ele);
ls.remove(hs1);
}
}
ls.add(hs2);
}
int counter=0;
BufferedWriter bw=new BufferedWriter(new FileWriter("transitiveoutput.txt"));
for(Set s: ls)
{
bw.write(s.toString().replace("[","").replace("]", "").trim());
bw.newLine();
System.out.println(s.toString().replace("[","").replace("]", "").trim());
}
System.out.println("Transitive Closure comepleted.....");
bw.close();
}
我可以用java实现输出,但是对于大文件它需要无限的时间,所以我想用map reduce来代替它。
暂无答案!
目前还没有任何答案,快来回答吧!