本文整理了Java中org.apache.spark.api.java.JavaPairRDD.cogroup()
方法的一些代码示例,展示了JavaPairRDD.cogroup()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JavaPairRDD.cogroup()
方法的具体详情如下:
包路径:org.apache.spark.api.java.JavaPairRDD
类名称:JavaPairRDD
方法名:cogroup
暂无
代码示例来源:origin: databricks/learning-spark
public static <K, V> JavaPairRDD<K, V> intersectByKey(JavaPairRDD<K, V> rdd1, JavaPairRDD<K, V> rdd2) {
JavaPairRDD<K, Tuple2<Iterable<V>, Iterable<V>>> grouped = rdd1.cogroup(rdd2);
return grouped.flatMapValues(new Function<Tuple2<Iterable<V>, Iterable<V>>, Iterable<V>>() {
@Override
public Iterable<V> call(Tuple2<Iterable<V>, Iterable<V>> input) {
ArrayList<V> al = new ArrayList<V>();
if (!Iterables.isEmpty(input._1()) && !Iterables.isEmpty(input._2())) {
Iterables.addAll(al, input._1());
Iterables.addAll(al, input._2());
}
return al;
}
});
}
public static void main(String[] args) throws Exception {
代码示例来源:origin: org.apache.spark/spark-core
@SuppressWarnings("unchecked")
@Test
public void cogroup() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Tuple2<Iterable<String>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.11
@SuppressWarnings("unchecked")
@Test
public void cogroup() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Tuple2<Iterable<String>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.10
@SuppressWarnings("unchecked")
@Test
public void cogroup() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Tuple2<Iterable<String>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.10
@SuppressWarnings("unchecked")
@Test
public void cogroup3() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, Tuple3<Iterable<String>, Iterable<Integer>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices, quantities);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core
@SuppressWarnings("unchecked")
@Test
public void cogroup4() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, String> countries = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", "BR"),
new Tuple2<>("Apples", "US")
));
JavaPairRDD<String, Tuple4<Iterable<String>, Iterable<Integer>, Iterable<Integer>,
Iterable<String>>> cogrouped = categories.cogroup(prices, quantities, countries);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
assertEquals("[BR]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._4()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.11
@SuppressWarnings("unchecked")
@Test
public void cogroup3() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, Tuple3<Iterable<String>, Iterable<Integer>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices, quantities);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core
@SuppressWarnings("unchecked")
@Test
public void cogroup3() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, Tuple3<Iterable<String>, Iterable<Integer>, Iterable<Integer>>> cogrouped =
categories.cogroup(prices, quantities);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.11
@SuppressWarnings("unchecked")
@Test
public void cogroup4() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, String> countries = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", "BR"),
new Tuple2<>("Apples", "US")
));
JavaPairRDD<String, Tuple4<Iterable<String>, Iterable<Integer>, Iterable<Integer>,
Iterable<String>>> cogrouped = categories.cogroup(prices, quantities, countries);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
assertEquals("[BR]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._4()));
cogrouped.collect();
}
代码示例来源:origin: org.apache.spark/spark-core_2.10
@SuppressWarnings("unchecked")
@Test
public void cogroup4() {
JavaPairRDD<String, String> categories = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Apples", "Fruit"),
new Tuple2<>("Oranges", "Fruit"),
new Tuple2<>("Oranges", "Citrus")
));
JavaPairRDD<String, Integer> prices = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 2),
new Tuple2<>("Apples", 3)
));
JavaPairRDD<String, Integer> quantities = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", 21),
new Tuple2<>("Apples", 42)
));
JavaPairRDD<String, String> countries = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("Oranges", "BR"),
new Tuple2<>("Apples", "US")
));
JavaPairRDD<String, Tuple4<Iterable<String>, Iterable<Integer>, Iterable<Integer>,
Iterable<String>>> cogrouped = categories.cogroup(prices, quantities, countries);
assertEquals("[Fruit, Citrus]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._1()));
assertEquals("[2]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._2()));
assertEquals("[42]", Iterables.toString(cogrouped.lookup("Apples").get(0)._3()));
assertEquals("[BR]", Iterables.toString(cogrouped.lookup("Oranges").get(0)._4()));
cogrouped.collect();
}
代码示例来源:origin: DataSystemsLab/GeoSpark
throw new Exception("[OverlayOperator][JoinImage] The back image is not distributed. Please don't use distributed format.");
this.distributedBackRasterImage = this.distributedBackRasterImage.cogroup(distributedFontImage).mapToPair(new PairFunction<Tuple2<Integer, Tuple2<Iterable<ImageSerializableWrapper>, Iterable<ImageSerializableWrapper>>>, Integer, ImageSerializableWrapper>()
代码示例来源:origin: locationtech/geowave
leftTier.cogroup(rightTier, partitioner);
代码示例来源:origin: cloudera-labs/envelope
@Override
public Dataset<Row> derive(Map<String, Dataset<Row>> dependencies) throws Exception {
if (!dependencies.containsKey(intoDependency)) {
throw new RuntimeException("Nest deriver points to non-existent nest-into dependency");
}
Dataset<Row> into = dependencies.get(intoDependency);
if (!dependencies.containsKey(fromDependency)) {
throw new RuntimeException("Nest deriver points to non-existent nest-from dependency");
}
Dataset<Row> from = dependencies.get(fromDependency);
ExtractFieldsFunction extractFieldsFunction = new ExtractFieldsFunction(keyFieldNames);
JavaPairRDD<List<Object>, Row> keyedIntoRDD = into.javaRDD().keyBy(extractFieldsFunction);
JavaPairRDD<List<Object>, Row> keyedFromRDD = from.javaRDD().keyBy(extractFieldsFunction);
NestFunction nestFunction = new NestFunction();
JavaRDD<Row> nestedRDD = keyedIntoRDD.cogroup(keyedFromRDD).values().map(nestFunction);
StructType nestedSchema = into.schema().add(nestedFieldName, DataTypes.createArrayType(from.schema()));
Dataset<Row> nested = into.sqlContext().createDataFrame(nestedRDD, nestedSchema);
return nested;
}
代码示例来源:origin: org.datavec/datavec-spark
leftJV.cogroup(rightJV);
代码示例来源:origin: org.datavec/datavec-spark_2.11
leftJV.cogroup(rightJV);
代码示例来源:origin: scipr-lab/dizk
JavaPairRDD<Long, Tuple2<Iterable<Tuple2<Long, FieldT>>, Iterable<Tuple2<Long, FieldT>>>> cogroupResult = aFullRDD.cogroup(bFullRDD);
代码示例来源:origin: org.qcri.rheem/rheem-spark
@Override
public Tuple<Collection<ExecutionLineageNode>, Collection<ChannelInstance>> evaluate(
ChannelInstance[] inputs,
ChannelInstance[] outputs,
SparkExecutor sparkExecutor,
OptimizationContext.OperatorContext operatorContext) {
assert inputs.length == this.getNumInputs();
assert outputs.length == this.getNumOutputs();
final RddChannel.Instance input0 = (RddChannel.Instance) inputs[0];
final RddChannel.Instance input1 = (RddChannel.Instance) inputs[1];
final RddChannel.Instance output = (RddChannel.Instance) outputs[0];
final JavaRDD<In0> inputRdd0 = input0.provideRdd();
final JavaRDD<In1> inputRdd1 = input1.provideRdd();
FunctionCompiler compiler = sparkExecutor.getCompiler();
final PairFunction<In0, Key, In0> keyExtractor0 = compiler.compileToKeyExtractor(this.keyDescriptor0);
final PairFunction<In1, Key, In1> keyExtractor1 = compiler.compileToKeyExtractor(this.keyDescriptor1);
JavaPairRDD<Key, In0> pairRdd0 = inputRdd0.mapToPair(keyExtractor0);
JavaPairRDD<Key, In1> pairRdd1 = inputRdd1.mapToPair(keyExtractor1);
final JavaPairRDD<Key, scala.Tuple2<Iterable<In0>, Iterable<In1>>> outputPair =
pairRdd0.cogroup(pairRdd1, sparkExecutor.getNumDefaultPartitions());
this.name(outputPair);
// Map the output to what Rheem expects.
final JavaRDD<Tuple2<Iterable<In0>, Iterable<In1>>> outputRdd = outputPair.map(new TupleConverter<>());
this.name(outputRdd);
output.accept(outputRdd, sparkExecutor);
return ExecutionOperator.modelLazyExecution(inputs, outputs, operatorContext);
}
代码示例来源:origin: scipr-lab/dizk
JavaPairRDD<Long, FieldT> xm = xTransformed.cogroup(meanTransformed).flatMapToPair(x -> {
return xMinusMeanAssignment(fieldFactory, x._1(), x._2(), n, d, outputAssignmentIndexer);
});
内容来源于网络,如有侵权,请联系作者删除!