如何实现排序合并bucketingMap连接？

kx1ctssn 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(274)

我想连接两个表，它们有一个公共列和相同数量的bucket，并且具有相同的排序。
除此设置外，我是否需要设置除设置属性以外的任何其他条件？

set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

hadoop Hive merge sorting Bucket

来源：https://stackoverflow.com/questions/26269189/how-to-implement-sort-merge-bucketing-map-join

1条答案

按热度按时间

zbwhf8kr1#

如果有两个数据集对于map-side连接来说太大，那么连接它们的一种有效技术就是将这两个数据集排序到bucket中。
诀窍是按相同的联接键进行聚类和排序。
创建表order（int，price float，quantity int），由（cid）聚集到32个bucket中；
创建表customer（id int，first string，last string），按（id）聚集到32个bucket中；
这提供了两个主要的优化好处：

Sorting by join key makes joins easy ,all possible matches value resides on the same area on disk 

Hash bucketing a join  key ensures all matching values reside on same node ,equi join can then run with no shuffle .

赞(0）回复(0）举报 2021-06-03

我来回答

如何实现排序合并bucketingMap连接？

1条答案

相关问题

热门标签

最新问答