本文整理了Java中org.apache.spark.api.java.JavaPairRDD.<init>()
方法的一些代码示例,展示了JavaPairRDD.<init>()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JavaPairRDD.<init>()
方法的具体详情如下:
包路径:org.apache.spark.api.java.JavaPairRDD
类名称:JavaPairRDD
方法名:<init>
暂无
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-unshaded
/**
* Groups items with the same key, assuming the items with the same key are next to each other in the
* collection. It does not perform shuffle, therefore it is much faster than using much more
* universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
* represent a prefix of the primary key, containing at least the partition key of the Cassandra
* table.
*/
public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
.map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
}
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector
/**
* Groups items with the same key, assuming the items with the same key are next to each other in the
* collection. It does not perform shuffle, therefore it is much faster than using much more
* universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
* represent a prefix of the primary key, containing at least the partition key of the Cassandra
* table.
*/
public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
.map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
}
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector_2.10
/**
* Groups items with the same key, assuming the items with the same key are next to each other in the
* collection. It does not perform shuffle, therefore it is much faster than using much more
* universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
* represent a prefix of the primary key, containing at least the partition key of the Cassandra
* table.
*/
public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
.map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
}
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java
/**
* Groups items with the same key, assuming the items with the same key are next to each other in the
* collection. It does not perform shuffle, therefore it is much faster than using much more
* universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
* represent a prefix of the primary key, containing at least the partition key of the Cassandra
* table.
*/
public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
.map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
}
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java_2.10
/**
* Groups items with the same key, assuming the items with the same key are next to each other in the
* collection. It does not perform shuffle, therefore it is much faster than using much more
* universal Spark RDD `groupByKey`. For this method to be useful with Cassandra tables, the key must
* represent a prefix of the primary key, containing at least the partition key of the Cassandra
* table.
*/
public JavaPairRDD<K, Collection<V>> spanByKey(ClassTag<K> keyClassTag) {
ClassTag<Tuple2<K, Collection<V>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Collection<V>> vClassTag = classTag(Collection.class);
RDD<Tuple2<K, Collection<V>>> newRDD = pairRDDFunctions.spanByKey()
.map(JavaApiHelper.<K, V, Seq<V>>valuesAsJavaCollection(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, vClassTag);
}
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector
/**
* Applies a function to each item, and groups consecutive items having the same value together.
* Contrary to {@code groupBy}, items from the same group must be already next to each other in the
* original collection. Works locally on each partition, so items from different partitions will
* never be placed in the same group.
*/
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
.map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java_2.10
/**
* Applies a function to each item, and groups consecutive items having the same value together.
* Contrary to {@code groupBy}, items from the same group must be already next to each other in the
* original collection. Works locally on each partition, so items from different partitions will
* never be placed in the same group.
*/
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
.map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-java
/**
* Applies a function to each item, and groups consecutive items having the same value together.
* Contrary to {@code groupBy}, items from the same group must be already next to each other in the
* original collection. Works locally on each partition, so items from different partitions will
* never be placed in the same group.
*/
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
.map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector_2.10
/**
* Applies a function to each item, and groups consecutive items having the same value together.
* Contrary to {@code groupBy}, items from the same group must be already next to each other in the
* original collection. Works locally on each partition, so items from different partitions will
* never be placed in the same group.
*/
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
.map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}
代码示例来源:origin: com.datastax.spark/spark-cassandra-connector-unshaded
/**
* Applies a function to each item, and groups consecutive items having the same value together.
* Contrary to {@code groupBy}, items from the same group must be already next to each other in the
* original collection. Works locally on each partition, so items from different partitions will
* never be placed in the same group.
*/
public <U> JavaPairRDD<U, Iterable<T>> spanBy(final Function<T, U> f, ClassTag<U> keyClassTag) {
ClassTag<Tuple2<U, Iterable<T>>> tupleClassTag = classTag(Tuple2.class);
ClassTag<Iterable<T>> iterableClassTag = CassandraJavaUtil.classTag(Iterable.class);
RDD<Tuple2<U, Iterable<T>>> newRDD = rddFunctions.spanBy(toScalaFunction1(f))
.map(JavaApiHelper.<U, T, scala.collection.Iterable<T>>valuesAsJavaIterable(), tupleClassTag);
return new JavaPairRDD<>(newRDD, keyClassTag, iterableClassTag);
}
代码示例来源:origin: org.apache.pig/pig
@Override
public RDD<Tuple> convert(List<RDD<Tuple>> predecessors, POSort sortOperator)
throws IOException {
SparkUtil.assertPredecessorSize(predecessors, sortOperator, 1);
RDD<Tuple> rdd = predecessors.get(0);
int parallelism = SparkPigContext.get().getParallelism(predecessors, sortOperator);
RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyValueFunction(),
SparkUtil.<Tuple, Object> getTuple2Manifest());
JavaPairRDD<Tuple, Object> r = new JavaPairRDD<Tuple, Object>(rddPair,
SparkUtil.getManifest(Tuple.class),
SparkUtil.getManifest(Object.class));
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
sortOperator.getMComparator(), true, parallelism);
JavaRDD<Tuple> mapped = sorted.mapPartitions(TO_VALUE_FUNCTION);
return mapped.rdd();
}
代码示例来源:origin: org.apache.pig/pig
JavaPairRDD<PartitionIndexedKey, Tuple> skewIndexedJavaPairRDD = new JavaPairRDD<PartitionIndexedKey, Tuple>(
skewIdxKeyRDD, SparkUtil.getManifest(PartitionIndexedKey.class),
SparkUtil.getManifest(Tuple.class));
JavaPairRDD<PartitionIndexedKey, Tuple> streamIndexedJavaPairRDD = new JavaPairRDD<PartitionIndexedKey, Tuple>(
streamIdxKeyJavaRDD.rdd(), SparkUtil.getManifest(PartitionIndexedKey.class),
SparkUtil.getManifest(Tuple.class));
代码示例来源:origin: org.apache.pig/pig
@Override
public RDD<Tuple> convert(List<RDD<Tuple>> predecessors, POSampleSortSpark sortOperator)
throws IOException {
SparkUtil.assertPredecessorSize(predecessors, sortOperator, 1);
RDD<Tuple> rdd = predecessors.get(0);
RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyValueFunction(),
SparkUtil.<Tuple, Object> getTuple2Manifest());
JavaPairRDD<Tuple, Object> r = new JavaPairRDD<Tuple, Object>(rddPair,
SparkUtil.getManifest(Tuple.class),
SparkUtil.getManifest(Object.class));
//sort sample data
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(true);
//convert every element in sample data from element to (all, element) format
JavaPairRDD<String, Tuple> mapped = sorted.mapPartitionsToPair(new AggregateFunction());
//use groupByKey to aggregate all values( the format will be ((all),{(sampleEle1),(sampleEle2),...} )
JavaRDD<Tuple> groupByKey= mapped.groupByKey().map(new ToValueFunction());
return groupByKey.rdd();
}
代码示例来源:origin: org.apache.pig/pig
private JavaRDD<Tuple2<IndexedKey, Tuple>> handleSecondarySort(
RDD<Tuple> rdd, POReduceBySpark op, int parallelism) {
RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyNullValueFunction(),
SparkUtil.<Tuple, Object>getTuple2Manifest());
JavaPairRDD<Tuple, Object> pairRDD = new JavaPairRDD<Tuple, Object>(rddPair,
SparkUtil.getManifest(Tuple.class),
SparkUtil.getManifest(Object.class));
//first sort the tuple by secondary key if enable useSecondaryKey sort
JavaPairRDD<Tuple, Object> sorted = pairRDD.repartitionAndSortWithinPartitions(
new HashPartitioner(parallelism),
new PigSecondaryKeyComparatorSpark(op.getSecondarySortOrder()));
JavaRDD<Tuple> jrdd = sorted.keys();
JavaRDD<Tuple2<IndexedKey, Tuple>> jrddPair = jrdd.map(new ToKeyValueFunction(op));
return jrddPair;
}
代码示例来源:origin: org.apache.pig/pig
private JavaRDD<Tuple2<IndexedKey,Tuple>> handleSecondarySort(
RDD<Tuple> rdd, POGlobalRearrangeSpark op, int parallelism) {
RDD<Tuple2<Tuple, Object>> rddPair = rdd.map(new ToKeyNullValueFunction(),
SparkUtil.<Tuple, Object>getTuple2Manifest());
JavaPairRDD<Tuple, Object> pairRDD = new JavaPairRDD<Tuple, Object>(rddPair,
SparkUtil.getManifest(Tuple.class),
SparkUtil.getManifest(Object.class));
//first sort the tuple by secondary key if enable useSecondaryKey sort
JavaPairRDD<Tuple, Object> sorted = pairRDD.repartitionAndSortWithinPartitions(
new HashPartitioner(parallelism),
new PigSecondaryKeyComparatorSpark(op.getSecondarySortOrder()));
JavaRDD<Tuple> jrdd = sorted.keys();
JavaRDD<Tuple2<IndexedKey,Tuple>> jrddPair = jrdd.map(new ToKeyValueFunction(op));
return jrddPair;
}
内容来源于网络,如有侵权,请联系作者删除!