本文整理了Java中org.apache.spark.api.java.JavaPairRDD.takeOrdered()
方法的一些代码示例,展示了JavaPairRDD.takeOrdered()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JavaPairRDD.takeOrdered()
方法的具体详情如下:
包路径:org.apache.spark.api.java.JavaPairRDD
类名称:JavaPairRDD
方法名:takeOrdered
暂无
代码示例来源:origin: databricks/learning-spark
public Void call(JavaPairRDD<String, Long> rdd) {
currentTopEndpoints = rdd.takeOrdered(
10,
new Functions.ValueComparator<String, Long>(cmp));
return null;
}});
}
代码示例来源:origin: mahmoudparsian/data-algorithms-book
List<Tuple2<String,Double>> outliers = avfScore.takeOrdered(K, TupleComparatorAscending.INSTANCE);
System.out.println("Ascending AVF Score:");
System.out.println(outliers);
代码示例来源:origin: mahmoudparsian/data-algorithms-book
List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);
代码示例来源:origin: mahmoudparsian/data-algorithms-book
List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);
代码示例来源:origin: mahmoudparsian/data-algorithms-book
List<Tuple2<String,Double>> outliers = avfScore.takeOrdered(K, TupleComparatorAscending.INSTANCE);
System.out.println("Ascending AVF Score:");
System.out.println(outliers);
代码示例来源:origin: ypriverol/spark-java8
List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);
代码示例来源:origin: org.datavec/datavec-spark_2.11
/**
* Sample the N most frequently occurring values in the specified column
*
* @param nMostFrequent Top N values to sample
* @param columnName Name of the column to sample from
* @param schema Schema of the data
* @param data RDD containing the data
* @return List of the most frequently occurring Writable objects in that column, along with their counts
*/
public static Map<Writable, Long> sampleMostFrequentFromColumn(int nMostFrequent, String columnName, Schema schema,
JavaRDD<List<Writable>> data) {
int columnIdx = schema.getIndexOfColumn(columnName);
JavaPairRDD<Writable, Long> keyedByWritable = data.mapToPair(new ColumnToKeyPairTransform(columnIdx));
JavaPairRDD<Writable, Long> reducedByWritable = keyedByWritable.reduceByKey(new SumLongsFunction2());
List<Tuple2<Writable, Long>> list =
reducedByWritable.takeOrdered(nMostFrequent, new Tuple2Comparator<Writable>(false));
List<Tuple2<Writable, Long>> sorted = new ArrayList<>(list);
Collections.sort(sorted, new Tuple2Comparator<Writable>(false));
Map<Writable, Long> map = new LinkedHashMap<>();
for (Tuple2<Writable, Long> t2 : sorted) {
map.put(t2._1(), t2._2());
}
return map;
}
}
代码示例来源:origin: org.datavec/datavec-spark
/**
* Sample the N most frequently occurring values in the specified column
*
* @param nMostFrequent Top N values to sample
* @param columnName Name of the column to sample from
* @param schema Schema of the data
* @param data RDD containing the data
* @return List of the most frequently occurring Writable objects in that column, along with their counts
*/
public static Map<Writable, Long> sampleMostFrequentFromColumn(int nMostFrequent, String columnName, Schema schema,
JavaRDD<List<Writable>> data) {
int columnIdx = schema.getIndexOfColumn(columnName);
JavaPairRDD<Writable, Long> keyedByWritable = data.mapToPair(new ColumnToKeyPairTransform(columnIdx));
JavaPairRDD<Writable, Long> reducedByWritable = keyedByWritable.reduceByKey(new SumLongsFunction2());
List<Tuple2<Writable, Long>> list =
reducedByWritable.takeOrdered(nMostFrequent, new Tuple2Comparator<Writable>(false));
List<Tuple2<Writable, Long>> sorted = new ArrayList<>(list);
Collections.sort(sorted, new Tuple2Comparator<Writable>(false));
Map<Writable, Long> map = new LinkedHashMap<>();
for (Tuple2<Writable, Long> t2 : sorted) {
map.put(t2._1(), t2._2());
}
return map;
}
}
内容来源于网络,如有侵权,请联系作者删除!