hive：分区表中的Map联接

u91tlkcl 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(374)

考虑到hive中具有事实表和维度表的典型数据仓库场景，假设事实表通过分区拆分到多个数据节点上。在将事实表（已分区）与维度（未分区）连接时，使用Map连接似乎是合乎逻辑的，因为维度表的大小很小&它们要存储在内存中，以便在所有节点上高效地与事实数据连接。
但是，很少有联机资源建议对分区表执行Map联接，两个表上的分区键应该与联接键相同。
所以，这就是我要寻找答案的问题：
分区表（事实）能否与非分区表（维度）Map联接？

Hive hiveql hadoop-partitioning

来源：https://stackoverflow.com/questions/44498170/hive-map-joins-in-partitioned-tables

1条答案

按热度按时间

i7uq4tfw1#

答案是-是的
Map联接运算符
演示

create table fact (rec_id int,dim_id int) partitioned by (dt date);
create table dim  (dim_id int,descr string);

explain
select  *
from    fact f join dim d
        on d.dim_id = f.dim_id

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: dim_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dim_id (type: int)
                  1 dim_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: f
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: dim_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 dim_id (type: int)
                  1 dim_id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col6, _col7
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: date), _col6 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

赞(0）回复(0）举报 2021-06-26

我来回答

hive：分区表中的Map联接

1条答案

相关问题

热门标签

最新问答