org.apache.flink.examples.java.clustering.KMeans类的使用及代码示例

x33g5p2x  于2022-01-24 转载在 其他  
字(4.6k)|赞(0)|评价(0)|浏览(167)

本文整理了Java中org.apache.flink.examples.java.clustering.KMeans类的一些代码示例,展示了KMeans类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。KMeans类的具体详情如下:
包路径:org.apache.flink.examples.java.clustering.KMeans
类名称:KMeans

KMeans介绍

[英]This example implements a basic K-Means clustering algorithm.

K-Means is an iterative clustering algorithm and works as follows:
K-Means is given a set of data points to be clustered and an initial set of K cluster centers. In each iteration, the algorithm computes the distance of each data point to each cluster center. Each point is assigned to the cluster center which is closest to it. Subsequently, each cluster center is moved to the center (mean) of all points that have been assigned to it. The moved cluster centers are fed into the next iteration. The algorithm terminates after a fixed number of iterations (as in this implementation) or if cluster centers do not (significantly) move in an iteration.
This is the Wikipedia entry for the K-Means Clustering algorithm.

This implementation works on two-dimensional data points.
It computes an assignment of data points to cluster centers, i.e., each data point is annotated with the id of the final cluster (center) it belongs to.

Input files are plain text files and must be formatted as follows:

  • Data points are represented as two double values separated by a blank character. Data points are separated by newline characters.
    For example "1.2 2.3\n5.3 7.2\n" gives two data points (x=1.2, y=2.3) and (x=5.3, y=7.2).
  • Cluster centers are represented by an integer id and a point value.
    For example "1 6.2 3.2\n2 2.9 5.7\n" gives two centers (id=1, x=6.2, y=3.2) and (id=2, x=2.9, y=5.7).

Usage: KMeans --points <path> --centroids <path> --output <path> --iterations <n>
If no parameters are provided, the program is run with default data from org.apache.flink.examples.java.clustering.util.KMeansData and 10 iterations.

This example shows how to use:

  • Bulk iterations
  • Broadcast variables in bulk iterations
  • Custom Java objects (POJOs)
    [中]这个例子实现了一个基本的K-Means聚类算法。
    K-Means是一种迭代聚类算法,其工作原理如下:
    K-Means给出了一组待聚类的数据点和一组初始的K个聚类中心。在每次迭代中,该算法计算每个数据点到每个聚类中心的距离。每个点都指定给离其最近的簇中心。随后,每个簇中心将移动到已指定给它的所有点的中心(平均值)。移动的集群中心被送入下一个迭代。该算法在固定次数的迭代后(如本实现中)或集群中心在迭代中没有(显著)移动时终止。
    这是K-Means Clustering algorithm的维基百科条目。
    此实现适用于二维数据点。
    它计算数据点到集群中心的分配,即,每个数据点都用它所属的最终集群(中心)的id进行注释。
    输入文件为纯文本文件,格式必须如下:
    *数据点表示为两个由空白字符分隔的双值。数据点由换行符分隔。
    例如"1.2 2.3\n5.3 7.2\n"给出了两个数据点(x=1.2,y=2.3)和(x=5.3,y=7.2)。
    *群集中心由一个整数id和一个点值表示。
    例如"1 6.2 3.2\n2 2.9 5.7\n"给出了两个中心(id=1,x=6.2,y=3.2)和(id=2,x=2.9,y=5.7)。
    用法:KMeans --points <path> --centroids <path> --output <path> --iterations <n>
    如果未提供任何参数,程序将使用来自org的默认数据运行。阿帕奇。弗林克。例子。JAVA集群。util。KMeansData和10次迭代。
    此示例显示了如何使用:
    *批量迭代
    *批量迭代中的广播变量
    *自定义Java对象(POJO)

代码示例

代码示例来源:origin: apache/flink

DataSet<Point> points = getPointDataSet(params, env);
DataSet<Centroid> centroids = getCentroidDataSet(params, env);

代码示例来源:origin: apache/flink

TestingExecutionEnvironment.setAsNext(validator, parallelism);
KMeans.main(new String[0]);
KMeans.main(new String[] {
  "--points", tmpDir,
  "--centroids", tmpDir,

代码示例来源:origin: org.apache.flink/flink-java-examples

public static void main(String[] args) throws Exception {
  if(!parseParameters(args)) {
    return;
  DataSet<Point> points = getPointDataSet(env);
  DataSet<Centroid> centroids = getCentroidDataSet(env);

代码示例来源:origin: org.apache.flink/flink-examples-batch_2.10

DataSet<Point> points = getPointDataSet(params, env);
DataSet<Centroid> centroids = getCentroidDataSet(params, env);

代码示例来源:origin: apache/flink

@Test
public void dumpIterativeKMeans() {
  // prepare the test environment
  PreviewPlanEnvironment env = new PreviewPlanEnvironment();
  env.setAsContext();
  try {
    KMeans.main(new String[] {
      "--points ", IN_FILE,
      "--centroids ", IN_FILE,
      "--output ", OUT_FILE,
      "--iterations", "123"});
  } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
    // all good.
  } catch (Exception e) {
    e.printStackTrace();
    Assert.fail("KMeans failed with an exception");
  }
  dump(env.getPlan());
}

代码示例来源:origin: com.alibaba.blink/flink-examples-batch

DataSet<Point> points = getPointDataSet(params, env);
DataSet<Centroid> centroids = getCentroidDataSet(params, env);

代码示例来源:origin: apache/flink

@Test
public void dumpIterativeKMeans() {
  // prepare the test environment
  PreviewPlanEnvironment env = new PreviewPlanEnvironment();
  env.setAsContext();
  try {
    KMeans.main(new String[] {
      "--points ", IN_FILE,
      "--centroids ", IN_FILE,
      "--output ", OUT_FILE,
      "--iterations", "123"});
  } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
    // all good.
  } catch (Exception e) {
    e.printStackTrace();
    Assert.fail("KMeans failed with an exception");
  }
  dump(env.getPlan());
}

代码示例来源:origin: org.apache.flink/flink-examples-batch

DataSet<Point> points = getPointDataSet(params, env);
DataSet<Centroid> centroids = getCentroidDataSet(params, env);

相关文章

微信公众号

最新文章

更多