org.apache.spark.rdd.RDD.collect()方法的使用及代码示例

x33g5p2x  于2022-01-28 转载在 其他  
字(3.0k)|赞(0)|评价(0)|浏览(179)

本文整理了Java中org.apache.spark.rdd.RDD.collect方法的一些代码示例,展示了RDD.collect的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。RDD.collect方法的具体详情如下:
包路径:org.apache.spark.rdd.RDD
类名称:RDD
方法名:collect

RDD.collect介绍

暂无

代码示例

代码示例来源:origin: org.apache.spark/spark-core

@Test
public void collectUnderlyingScalaRDD() {
 List<SomeCustomClass> data = new ArrayList<>();
 for (int i = 0; i < 100; i++) {
  data.add(new SomeCustomClass());
 }
 JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
 SomeCustomClass[] collected =
  (SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
 assertEquals(data.size(), collected.length);
}

代码示例来源:origin: org.apache.spark/spark-core_2.11

@Test
public void collectUnderlyingScalaRDD() {
 List<SomeCustomClass> data = new ArrayList<>();
 for (int i = 0; i < 100; i++) {
  data.add(new SomeCustomClass());
 }
 JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
 SomeCustomClass[] collected =
  (SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
 assertEquals(data.size(), collected.length);
}

代码示例来源:origin: org.apache.spark/spark-core_2.10

@Test
public void collectUnderlyingScalaRDD() {
 List<SomeCustomClass> data = new ArrayList<>();
 for (int i = 0; i < 100; i++) {
  data.add(new SomeCustomClass());
 }
 JavaRDD<SomeCustomClass> rdd = sc.parallelize(data);
 SomeCustomClass[] collected =
  (SomeCustomClass[]) rdd.rdd().retag(SomeCustomClass.class).collect();
 assertEquals(data.size(), collected.length);
}

代码示例来源:origin: uber/uberscriptquery

Tuple2<String, String>[] tuples = (Tuple2<String, String>[]) sparkSession.sparkContext().wholeTextFiles(query, 1).collect();
query = tuples[0]._2();
System.out.println("Query: " + query);

代码示例来源:origin: com.stratio.deep/deep-cassandra

public static <W> void doCql3SaveToCassandra(RDD<W> rdd, ICassandraDeepJobConfig<W> writeConfig,
                       Function1<W, Tuple2<Cells, Cells>> transformer) {
  if (!writeConfig.getIsWriteConfig()) {
    throw new IllegalArgumentException("Provided configuration object is not suitable for writing");
  }
  Tuple2<Map<String, ByteBuffer>, Map<String, ByteBuffer>> tuple = new Tuple2<>(null, null);
  RDD<Tuple2<Cells, Cells>> mappedRDD = rdd.map(transformer,
      ClassTag$.MODULE$.<Tuple2<Cells, Cells>>apply(tuple.getClass()));
  ((CassandraDeepJobConfig) writeConfig).createOutputTableIfNeeded(mappedRDD.first());
  final int pageSize = writeConfig.getBatchSize();
  int offset = 0;
  List<Tuple2<Cells, Cells>> elements = Arrays.asList((Tuple2<Cells, Cells>[]) mappedRDD.collect());
  List<Tuple2<Cells, Cells>> split;
  do {
    split = elements.subList(pageSize * (offset++), Math.min(pageSize * offset, elements.size()));
    Batch batch = QueryBuilder.batch();
    for (Tuple2<Cells, Cells> t : split) {
      Tuple2<String[], Object[]> bindVars = Utils.prepareTuple4CqlDriver(t);
      Insert insert = QueryBuilder
          .insertInto(quote(writeConfig.getKeyspace()), quote(writeConfig.getTable()))
          .values(bindVars._1(), bindVars._2());
      batch.add(insert);
    }
    writeConfig.getSession().execute(batch);
  } while (!split.isEmpty() && split.size() == pageSize);
}

相关文章