将java列表从mapper传递到reducer

fae0ux8s  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(219)

我正在尝试编写map reduce代码,其中mapper接收带有逗号分隔的记录值的文件输入,我正在基于逗号拆分行,并为每行创建一个集合,最后将每个集合添加到列表集合中。
我不确定如何将其写入上下文并传递给reducer,或者更确切地说,mapreduce中的java列表集合使用什么数据类型。我听说过数组是可写的,但不能在代码中实现它。请看下面的代码:

public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>
{

    List<HashSet> listA=new CopyOnWriteArrayList<HashSet>();
    Set<Set> finalset=new HashSet();
    Set<String> hs3;

    public void map(LongWritable key, Text value,Context context) throws IOException,InterruptedException 
    {

        String line = value.toString();
        String[] str=line.split(",");
        HashSet<String> hs=new HashSet<>();
         for(int i=0;i<str.length;i++)
         {
             hs.add(str[i]);
         }
         listA.add(hs);

         context.write(key, listtobepassed); //listA in this case
    }

}

输入格式如下:;

N3H1P7RBS4QM_71,N3H1P7RBS4QM_72,N3H1P7RBS4QM_73
N3H1P7RBS4QM_143,N3H1P7RBS4QM_144,N3H1P7RBS4QM_167046,N3H1P7RBS4QM_328681,N3H1P7RBS4QM_328682
N3H1P7RBS4QM_145,N3H1P7RBS4QM_167047,N3H1P7RBS4QM_328683
N3H1P7RBS4QM_193,N3H1P7RBS4QM_194,N3H1P7RBS4QM_167088,N3H1P7RBS4QM_167089

更新问题:我试图通过map reduce对输入运行一个传递闭包程序;示例:输入

Ref1, Ref2, Ref3
Ref3
Ref4, Ref5, Ref6
Ref7, Ref8, Ref9     
Ref9
Ref10, Ref11
Ref11, Ref12

输出应如下所示:

Ref1, Ref2, Ref3,Ref4,Ref5,Ref6
Ref7, Ref8, Ref9
Ref10, Ref11, Ref12

公共输入值合并在一起。
我的java代码看起来像 public class TestList { ```
public static void main(String[] args) throws IOException
{
List ls=new CopyOnWriteArrayList();

 BufferedReader br=new BufferedReader(new FileReader("/inputfilepath"));
 String Data;

 while((Data=br.readLine())!=null)
 {
     String[] DataLine=Data.split(",");
     HashSet<String> hs =new HashSet();
     for(int i=0;i<DataLine.length;i++)
     {
         hs.add(DataLine[i]);

     }
     ls.add(hs);
 }
 br.close();
 //System.out.println(ls.iterator().next());
 Iterator<HashSet> itr=ls.iterator();
 HashSet<String> hs2=null;
 while(itr.hasNext())
 {
     HashSet<String> ele=itr.next();

     for (HashSet<String> hs1 : ls) 
     {
         if(!Collections.disjoint(hs1, ele))
         {
            hs2=new HashSet<String>(hs1);
            hs2.addAll(ele);
            ls.remove(ele);
            ls.remove(hs1);
         }

     }
     ls.add(hs2);

 }

 int counter=0;
 BufferedWriter bw=new BufferedWriter(new FileWriter("transitiveoutput.txt"));
 for(Set s: ls)
 {
     bw.write(s.toString().replace("[","").replace("]", "").trim());
     bw.newLine();
        System.out.println(s.toString().replace("[","").replace("]", "").trim());
 }
 System.out.println("Transitive Closure comepleted.....");
 bw.close();

}

我可以用java实现输出,但是对于大文件它需要无限的时间,所以我想用map reduce来代替它。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题