如何将mahout kmeans集群集成到应用程序中？

7fyelxc5 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(358)

我正在尝试为一个简单的应用程序使用mahoutkmeans。我从数据库内容手动创建一系列向量。我只想将这些向量馈送给mahout（0.9），例如kmeansclusterer并使用输出。
我阅读了mahout in action（来自版本0.5的示例）和许多在线论坛来获取背景知识。但是我再也看不到通过hadoop使用mahoutkmeans（或相关集群）而不使用文件名和文件路径的方法了。文档非常粗略，但是mahout还能用这种方式吗？目前有没有使用mahoutkmeans的例子（不是从命令行）。

private List<Cluster> kMeans(List<Vector> allvectors, double closeness, int numclusters, int iterations) {
    List<Cluster> clusters = new ArrayList<Cluster>() ; 

    int clusterId = 0;
    for (Vector v : allvectors) {
        clusters.add(new Kluster(v, clusterId++, new EuclideanDistanceMeasure()));
    }

    List<List<Cluster>> finalclusters = KMeansClusterer.clusterPoints(allvectors, clusters, 0.01, numclusters, 10) ;  

    for(Cluster cluster : finalclusters.get(finalclusters.size() - 1)) {
        System.out.println("Fuzzy Cluster id: " + cluster.getId() + " center: " + cluster.getCenter().asFormatString());
    }

    return clusters ;
}

hadoop k-means mahout

来源：https://stackoverflow.com/questions/22389214/how-can-you-integrate-mahout-kmeans-clustering-into-application

1条答案

按热度按时间

v1uwarro1#

首先需要将向量写入seq文件。代码如下：

List<VectorWritable> vectors = new ArrayList<>();
double[] vectorValues = {<your vector values>};
vectors.add(new VectorWritable(new NamedVector(new DenseVector(vectorValues), userName)));

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs = FileSystem.get(new File(writeFile).toURI(), conf);
writer = new SequenceFile.Writer(fs, conf, new Path(writeFile), Text.class, VectorWritable.class);

try {
      int i = 0;
      for (VectorWritable vw : vectors) {
        writer.append(new Text("mapred_" + i++), vw);
      }
    } finally {
      Closeables.close(writer, false);
    }

然后使用下面的行生成簇。您需要向kmeans提供初始集群，因此我使用canopy来生成初始集群。
但是，您将无法理解cluster的输出，因为它是seq文件格式。您需要在mahout-integration.jar中执行clusterdumper类来最终读取和理解集群。

Configuration conf = new Configuration(); 
CanopyDriver.run(conf, new Path(inputPath), new Path(canopyOutputPath), new ManhattanDistanceMeasure(), (double) 3.1, (double) 2.1, true, (double) 0.5, true );

                    // now run the KMeansDriver job
KMeansDriver.run(conf, new Path(inputPath), new Path(canopyOutputPath + "/clusters-0-final/"), new Path(kmeansOutput), new EuclideanDistanceMeasure(), 0.001, 10, true, 2d, false);

赞(0）回复(0）举报 2021-06-04

我来回答

如何将mahout kmeans集群集成到应用程序中？

1条答案

相关问题

热门标签

最新问答