mahout转置矩阵

7qhs6swi 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(389)

我是新手。我正试图用mahout tranpose命令行转换矩阵。
我的数据源文件中的每一行都是这样的： 1;456;789;012;.... . 键是每行中的第一个元素（在本例中是“1”）。每一行都是矩阵的向量。
我试着用“，”或空格“”来改变分隔符，但没有用。
为了转换矩阵，我首先使用以下命令将hdfs数据文件转换为序列文件：

mahout seqdirectory -c utf-8 -i /test/myfile -p /test/myfile_seq

然后我尝试使用以下命令将序列文件转换为向量：

mahout seq2sparse -i /test/myfile_seq/chunk-0 -o /test/myfile_vector

然后我用了这个命令：

sudo -u hdfs mahout transpose --input  /test/myfile_vector//tfidf-vectors/part-r-00000 --numRows 5 --numCols 24

我有几个问题：

- What is the separator to use in the data file source
- What should be the output of the "mahout seqdirectory" command?
- Did I need to convert my sequence file to vectors to transpose?

hadoop transpose mahout

来源：https://stackoverflow.com/questions/22175781/mahout-transpose-matrix

1条答案

按热度按时间

9njqaruj1#

请将与mahout相关的问题发布到mahout user@maillist，以便从mahout提交者那里获得更快速和明确的答案。
mahout的transposejob需要一个矩阵作为输入，而不会像你所拥有的那样处理单个向量。输入格式是什么并不重要。你可以有一个csv文件并解析每一行。
以下是你想要完成的步骤：
将输入csv文件转换为命名向量，其中vectorid将是案例中的关键。查看mahout的csviterator的代码，调整它以处理命名向量并解析输入中的每一行。
在namedvectors上运行mahout的rowidjob来创建所有向量的矩阵。矩阵的每一行都是您输入的一行。rowidjob的输出是-matrix和docindex。
矩阵-所有向量的所有串联矩阵的m*n矩阵docindex-documentid到documentname的Map（在ur情况下，它将把documentidMap到ur键）
将上一步的矩阵输出作为输入馈送到transposejob。您需要指定cli的行数和列数。
如果您还有任何问题，请发到mahout user@。

赞(0）回复(0）举报 2021-06-04

我来回答

mahout转置矩阵

1条答案

相关问题

热门标签

最新问答