是否可以用hadoop做一个前十名和一个join?

pn9klfpd  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(239)

我有两个文件:post和users。我需要通过帖子获得前10名用户,在sql中应该是:

SELECT us.name, COUNT(po.id) AS NumberOfPost FROM User us INNER JOIN Post po on
 po.userId = us.id GROUP BY us.name ORDER BY NumberOfPost DESC;

只有一份工作就可以做到这一点?不需要一份工作来加入前十名?我必须遵循mapreduce模式“top10”,但在这种情况下我不必遵循任何连接模式。有没有办法只做一份工作?

hujrc8aj

hujrc8aj1#

最好在hive中实现这一点。执行下面提到的查询来完成前10项

SELECT us.name, COUNT(po.id) AS NumberOfPost FROM User us INNER JOIN Post po on po.userId = us.id GROUP BY us.name ORDER BY NumberOfPost DESC Limit 10;

相关问题