org.apache.spark.api.java.JavaPairRDD.<init>()方法的使用及代码示例

x33g5p2x  于2022-01-21 转载在 其他  
字(13.0k)|赞(0)|评价(0)|浏览(89)

本文整理了Java中org.apache.spark.api.java.JavaPairRDD.<init>()方法的一些代码示例,展示了JavaPairRDD.<init>()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JavaPairRDD.<init>()方法的具体详情如下:
包路径:org.apache.spark.api.java.JavaPairRDD
类名称:JavaPairRDD
方法名:<init>

JavaPairRDD.<init>介绍

暂无

代码示例

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-unshaded

/**
   * Groups items with the same key, assuming the items with the same key are next to each other in the
   * collection. It does not perform shuffle, therefore it is much faster than using much more
   * universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
   * represent a prefix of the primary key, containing at least the partition key of the Cassandra
   * table.
   */
  public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
    ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
    ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
    RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
        .map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);

    return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
  }
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector

/**
   * Groups items with the same key, assuming the items with the same key are next to each other in the
   * collection. It does not perform shuffle, therefore it is much faster than using much more
   * universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
   * represent a prefix of the primary key, containing at least the partition key of the Cassandra
   * table.
   */
  public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
    ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
    ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
    RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
        .map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);

    return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
  }
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector_2.10

/**
   * Groups items with the same key, assuming the items with the same key are next to each other in the
   * collection. It does not perform shuffle, therefore it is much faster than using much more
   * universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
   * represent a prefix of the primary key, containing at least the partition key of the Cassandra
   * table.
   */
  public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
    ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
    ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
    RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
        .map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);

    return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
  }
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java

/**
   * Groups items with the same key, assuming the items with the same key are next to each other in the
   * collection. It does not perform shuffle, therefore it is much faster than using much more
   * universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
   * represent a prefix of the primary key, containing at least the partition key of the Cassandra
   * table.
   */
  public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
    ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
    ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
    RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
        .map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);

    return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
  }
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java_2.10

/**
   * Groups items with the same key, assuming the items with the same key are next to each other in the
   * collection. It does not perform shuffle, therefore it is much faster than using much more
   * universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
   * represent a prefix of the primary key, containing at least the partition key of the Cassandra
   * table.
   */
  public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
    ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
    ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
    RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
        .map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);

    return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
  }
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector

/**
 * Applies a function to each item, and groups consecutive items having the same value together.
 * Contrary to {@code groupBy}, items from the same group must be already next to each other in the
 * original collection. Works locally on each partition, so items from different partitions will
 * never be placed in the same group.
 */
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
  ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
  ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
  RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
      .map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
  return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java_2.10

/**
 * Applies a function to each item, and groups consecutive items having the same value together.
 * Contrary to {@code groupBy}, items from the same group must be already next to each other in the
 * original collection. Works locally on each partition, so items from different partitions will
 * never be placed in the same group.
 */
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
  ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
  ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
  RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
      .map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
  return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java

/**
 * Applies a function to each item, and groups consecutive items having the same value together.
 * Contrary to {@code groupBy}, items from the same group must be already next to each other in the
 * original collection. Works locally on each partition, so items from different partitions will
 * never be placed in the same group.
 */
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
  ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
  ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
  RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
      .map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
  return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector_2.10

/**
 * Applies a function to each item, and groups consecutive items having the same value together.
 * Contrary to {@code groupBy}, items from the same group must be already next to each other in the
 * original collection. Works locally on each partition, so items from different partitions will
 * never be placed in the same group.
 */
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
  ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
  ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
  RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
      .map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
  return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}

代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-unshaded

/**
 * Applies a function to each item, and groups consecutive items having the same value together.
 * Contrary to {@code groupBy}, items from the same group must be already next to each other in the
 * original collection. Works locally on each partition, so items from different partitions will
 * never be placed in the same group.
 */
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
  ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
  ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
  RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
      .map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
  return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}

代码示例来源:origin: org.apache.pig/pig

@Override
public RDD<Tuple> convert(List<RDD<Tuple>> predecessors, POSort sortOperator)
    throws IOException {
  SparkUtil.assertPredecessorSize(predecessors, sortOperator, 1);
  RDD<Tuple> rdd = predecessors.get(0);
  int parallelism = SparkPigContext.get().getParallelism(predecessors, sortOperator);
  RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyValueFunction(),
      SparkUtil.<Tuple, Object> getTuple2Manifest());
  JavaPairRDD<Tuple, Object> r = new JavaPairRDD<Tuple, Object>(rddPair,
      SparkUtil.getManifest(Tuple.class),
      SparkUtil.getManifest(Object.class));
  JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
      sortOperator.getMComparator(), true, parallelism);
  JavaRDD<Tuple> mapped = sorted.mapPartitions(TO_VALUE_FUNCTION);
  return mapped.rdd();
}

代码示例来源:origin: org.apache.pig/pig

JavaPairRDD<PartitionIndexedKey, Tuple> skewIndexedJavaPairRDD = new JavaPairRDD<PartitionIndexedKey, Tuple>(
    skewIdxKeyRDD, SparkUtil.getManifest(PartitionIndexedKey.class),
    SparkUtil.getManifest(Tuple.class));
JavaPairRDD<PartitionIndexedKey, Tuple> streamIndexedJavaPairRDD = new JavaPairRDD<PartitionIndexedKey, Tuple>(
    streamIdxKeyJavaRDD.rdd(), SparkUtil.getManifest(PartitionIndexedKey.class),
    SparkUtil.getManifest(Tuple.class));

代码示例来源:origin: org.apache.pig/pig

@Override
public RDD<Tuple> convert(List<RDD<Tuple>> predecessors, POSampleSortSpark sortOperator)
    throws IOException {
  SparkUtil.assertPredecessorSize(predecessors, sortOperator, 1);
  RDD<Tuple> rdd = predecessors.get(0);
  RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyValueFunction(),
      SparkUtil.<Tuple, Object> getTuple2Manifest());
  JavaPairRDD<Tuple, Object> r = new JavaPairRDD<Tuple, Object>(rddPair,
      SparkUtil.getManifest(Tuple.class),
      SparkUtil.getManifest(Object.class));
   //sort sample data
  JavaPairRDD<Tuple, Object> sorted = r.sortByKey(true);
   //convert every element in sample data from element to (all, element) format
  JavaPairRDD<String, Tuple> mapped = sorted.mapPartitionsToPair(new AggregateFunction());
  //use groupByKey to aggregate all values( the format will be ((all),{(sampleEle1),(sampleEle2),...} )
  JavaRDD<Tuple> groupByKey= mapped.groupByKey().map(new ToValueFunction());
  return  groupByKey.rdd();
}

代码示例来源:origin: org.apache.pig/pig

private JavaRDD<Tuple2<IndexedKey, Tuple>> handleSecondarySort(
    RDD<Tuple> rdd, POReduceBySpark op, int parallelism) {
  RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyNullValueFunction(),
      SparkUtil.<Tuple, Object>getTuple2Manifest());
  JavaPairRDD<Tuple, Object> pairRDD = new JavaPairRDD<Tuple, Object>(rddPair,
      SparkUtil.getManifest(Tuple.class),
      SparkUtil.getManifest(Object.class));
  //first sort the tuple by secondary key if enable useSecondaryKey sort
  JavaPairRDD<Tuple, Object> sorted = pairRDD.repartitionAndSortWithinPartitions(
      new HashPartitioner(parallelism),
      new PigSecondaryKeyComparatorSpark(op.getSecondarySortOrder()));
  JavaRDD<Tuple> jrdd = sorted.keys();
  JavaRDD<Tuple2<IndexedKey, Tuple>> jrddPair = jrdd.map(new ToKeyValueFunction(op));
  return jrddPair;
}

代码示例来源:origin: org.apache.pig/pig

private JavaRDD<Tuple2<IndexedKey,Tuple>> handleSecondarySort(
    RDD<Tuple> rdd, POGlobalRearrangeSpark op, int parallelism) {
  RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyNullValueFunction(),
      SparkUtil.<Tuple, Object>getTuple2Manifest());
  JavaPairRDD<Tuple, Object> pairRDD = new JavaPairRDD<Tuple, Object>(rddPair,
      SparkUtil.getManifest(Tuple.class),
      SparkUtil.getManifest(Object.class));
  //first sort the tuple by secondary key if enable useSecondaryKey sort
  JavaPairRDD<Tuple, Object> sorted = pairRDD.repartitionAndSortWithinPartitions(
      new HashPartitioner(parallelism),
      new PigSecondaryKeyComparatorSpark(op.getSecondarySortOrder()));
  JavaRDD<Tuple> jrdd = sorted.keys();
  JavaRDD<Tuple2<IndexedKey,Tuple>> jrddPair = jrdd.map(new ToKeyValueFunction(op));
  return jrddPair;
}

相关文章

微信公众号

最新文章

更多

JavaPairRDD类方法