org.apache.spark.rdd.RDD.getNumPartitions()方法的使用及代码示例

x33g5p2x  于2022-01-28 转载在 其他  
字(3.2k)|赞(0)|评价(0)|浏览(209)

本文整理了Java中org.apache.spark.rdd.RDD.getNumPartitions方法的一些代码示例,展示了RDD.getNumPartitions的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。RDD.getNumPartitions方法的具体详情如下:
包路径:org.apache.spark.rdd.RDD
类名称:RDD
方法名:getNumPartitions

RDD.getNumPartitions介绍

暂无

代码示例

代码示例来源:origin: uber/marmaray

private int calculateHiveNumPartitions(@NonNull final Dataset<Row> data) {
    /*
     * For now we just return the number of partitions in the underlying RDD, but in the future we can define
     * the type of strategy in the configuration and heuristically calculate the number of partitions.
     *
     * todo: T923425 to actually do the heuristic calculation to optimize num partitions
     */
    return data.rdd().getNumPartitions();
  }
}

代码示例来源:origin: org.apache.pig/pig

public static int getParallelism(List<RDD<Tuple>> predecessors,
                 PhysicalOperator physicalOperator) {
  if (defaultParallelism != null) {
    return getDefaultParallelism();
  }
  int parallelism = physicalOperator.getRequestedParallelism();
  if (parallelism <= 0) {
    //Spark automatically sets the number of "map" tasks to run on each file according to its size (though
    // you can control it through optional parameters to SparkContext.textFile, etc), and for distributed
    //"reduce" operations, such as groupByKey and reduceByKey, it uses the largest parent RDD's number of
    // partitions.
    int maxParallism = 0;
    for (int i = 0; i < predecessors.size(); i++) {
      int tmpParallelism = predecessors.get(i).getNumPartitions();
      if (tmpParallelism > maxParallism) {
        maxParallism = tmpParallelism;
      }
    }
    parallelism = maxParallism;
  }
  return parallelism;
}

代码示例来源:origin: apache/incubator-nemo

/**
 * Static method to create a JavaRDD object from an text file.
 *
 * @param sparkContext  the spark context containing configurations.
 * @param minPartitions the minimum number of partitions.
 * @param inputPath     the path of the input text file.
 * @return the new JavaRDD object
 */
public static JavaRDD<String> of(final SparkContext sparkContext,
                 final int minPartitions,
                 final String inputPath) {
 final DAGBuilder<IRVertex, IREdge> builder = new DAGBuilder<>();
 final org.apache.spark.rdd.RDD<String> textRdd = sparkContext.textFile(inputPath, minPartitions);
 final int numPartitions = textRdd.getNumPartitions();
 final IRVertex textSourceVertex = new SparkTextFileBoundedSourceVertex(sparkContext, inputPath, numPartitions);
 textSourceVertex.setProperty(ParallelismProperty.of(numPartitions));
 builder.addVertex(textSourceVertex);
 return new JavaRDD<>(textRdd, sparkContext, builder.buildWithoutSourceSinkCheck(), textSourceVertex);
}

代码示例来源:origin: apache/incubator-nemo

/**
 * Static method to create a JavaRDD object from a Dataset.
 *
 * @param sparkSession spark session containing configurations.
 * @param dataset      dataset to read initial data from.
 * @param <T>          type of the resulting object.
 * @return the new JavaRDD object.
 */
public static <T> JavaRDD<T> of(final SparkSession sparkSession,
                final Dataset<T> dataset) {
 final DAGBuilder<IRVertex, IREdge> builder = new DAGBuilder<>();
 final IRVertex sparkBoundedSourceVertex = new SparkDatasetBoundedSourceVertex<>(sparkSession, dataset);
 final org.apache.spark.rdd.RDD<T> sparkRDD = dataset.sparkRDD();
 sparkBoundedSourceVertex.setProperty(ParallelismProperty.of(sparkRDD.getNumPartitions()));
 builder.addVertex(sparkBoundedSourceVertex);
 return new JavaRDD<>(
   sparkRDD, sparkSession.sparkContext(), builder.buildWithoutSourceSinkCheck(), sparkBoundedSourceVertex);
}

相关文章