org.apache.spark.api.java.JavaPairRDD.takeOrdered()方法的使用及代码示例

x33g5p2x  于2022-01-21 转载在 其他  
字(3.7k)|赞(0)|评价(0)|浏览(84)

本文整理了Java中org.apache.spark.api.java.JavaPairRDD.takeOrdered()方法的一些代码示例,展示了JavaPairRDD.takeOrdered()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JavaPairRDD.takeOrdered()方法的具体详情如下:
包路径:org.apache.spark.api.java.JavaPairRDD
类名称:JavaPairRDD
方法名:takeOrdered

JavaPairRDD.takeOrdered介绍

暂无

代码示例

代码示例来源:origin: databricks/learning-spark

public Void call(JavaPairRDD<String, Long> rdd) {
  currentTopEndpoints = rdd.takeOrdered(
   10,
   new Functions.ValueComparator<String, Long>(cmp));
  return null;
   }});
}

代码示例来源:origin: mahmoudparsian/data-algorithms-book

List<Tuple2<String,Double>> outliers = avfScore.takeOrdered(K, TupleComparatorAscending.INSTANCE);       
System.out.println("Ascending AVF Score:");
System.out.println(outliers);

代码示例来源:origin: mahmoudparsian/data-algorithms-book

List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);

代码示例来源:origin: mahmoudparsian/data-algorithms-book

List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);

代码示例来源:origin: mahmoudparsian/data-algorithms-book

List<Tuple2<String,Double>> outliers = avfScore.takeOrdered(K, TupleComparatorAscending.INSTANCE);       
System.out.println("Ascending AVF Score:");
System.out.println(outliers);

代码示例来源:origin: ypriverol/spark-java8

List<Tuple2<String, Integer>> topNResult = uniqueKeys.takeOrdered(N, MyTupleComparator.INSTANCE);

代码示例来源:origin: org.datavec/datavec-spark_2.11

/**
   * Sample the N most frequently occurring values in the specified column
   *
   * @param nMostFrequent    Top N values to sample
   * @param columnName       Name of the column to sample from
   * @param schema           Schema of the data
   * @param data             RDD containing the data
   * @return                 List of the most frequently occurring Writable objects in that column, along with their counts
   */
  public static Map<Writable, Long> sampleMostFrequentFromColumn(int nMostFrequent, String columnName, Schema schema,
          JavaRDD<List<Writable>> data) {
    int columnIdx = schema.getIndexOfColumn(columnName);

    JavaPairRDD<Writable, Long> keyedByWritable = data.mapToPair(new ColumnToKeyPairTransform(columnIdx));
    JavaPairRDD<Writable, Long> reducedByWritable = keyedByWritable.reduceByKey(new SumLongsFunction2());

    List<Tuple2<Writable, Long>> list =
            reducedByWritable.takeOrdered(nMostFrequent, new Tuple2Comparator<Writable>(false));

    List<Tuple2<Writable, Long>> sorted = new ArrayList<>(list);
    Collections.sort(sorted, new Tuple2Comparator<Writable>(false));

    Map<Writable, Long> map = new LinkedHashMap<>();
    for (Tuple2<Writable, Long> t2 : sorted) {
      map.put(t2._1(), t2._2());
    }

    return map;
  }
}

代码示例来源:origin: org.datavec/datavec-spark

/**
   * Sample the N most frequently occurring values in the specified column
   *
   * @param nMostFrequent    Top N values to sample
   * @param columnName       Name of the column to sample from
   * @param schema           Schema of the data
   * @param data             RDD containing the data
   * @return                 List of the most frequently occurring Writable objects in that column, along with their counts
   */
  public static Map<Writable, Long> sampleMostFrequentFromColumn(int nMostFrequent, String columnName, Schema schema,
          JavaRDD<List<Writable>> data) {
    int columnIdx = schema.getIndexOfColumn(columnName);

    JavaPairRDD<Writable, Long> keyedByWritable = data.mapToPair(new ColumnToKeyPairTransform(columnIdx));
    JavaPairRDD<Writable, Long> reducedByWritable = keyedByWritable.reduceByKey(new SumLongsFunction2());

    List<Tuple2<Writable, Long>> list =
            reducedByWritable.takeOrdered(nMostFrequent, new Tuple2Comparator<Writable>(false));

    List<Tuple2<Writable, Long>> sorted = new ArrayList<>(list);
    Collections.sort(sorted, new Tuple2Comparator<Writable>(false));

    Map<Writable, Long> map = new LinkedHashMap<>();
    for (Tuple2<Writable, Long> t2 : sorted) {
      map.put(t2._1(), t2._2());
    }

    return map;
  }
}

相关文章

微信公众号

最新文章

更多

JavaPairRDD类方法