按性能划分的行数

gg58donl 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(164)

如何提高hive查询中行数划分的性能。

select *
    from
    (
    SELECT
                      '123'                                                                         AS run_session_id
                    , tbl1.transaction_id
                    , tbl1.src_transaction_id
                    , tbl1.transaction_created_epoch_time
                    , tbl1.currency
                    , tbl1.event_type
                    , tbl1.event_sub_type
                    , tbl1.estimated_total_cost
                    , tbl1.actual_total_cost
                    , tbl1.tfc_export_created_epoch_time
                    , tbl1.authorizer
                    , tbl1.acquirer
                    , tbl1.processor
                    , tbl1.company_code
                    , tbl1.country_of_account
                    , tbl1.merchant_id
                    , tbl1.client_id
                    , tbl1.ft_id
                    , tbl1.transaction_created_date
                    , tbl1.event_pst_time
                    , tbl1.extract_id_seq
                    , tbl1.src_type
                    , ROW_NUMBER() OVER(PARTITION by tbl1.transaction_id ORDER BY tbl1.event_pst_time DESC)   AS seq_num       -- while writing back to the pfit events table, write each event so that event_pst_time populates in right way

                  FROM nest.nest_cost_events tbl1                                --<hiveFinalDB>--                           -- DB variables wont work, so need to change the DB accrodingly for testing and PROD deployment
                  WHERE extract_id_seq     BETWEEN 275 - 60
                                           AND 275
                    AND event_type    in('ACT','CBR','SKU','CAL','KIT','BXT' )) tbl1
    where seq_num=1;

此表按src\u类型分区。现在需要20个MNT来处理1.54亿条记录。我想减少到10 MNT。
有什么建议吗？
谢谢

Hive query-performance

来源：https://stackoverflow.com/questions/44957817/row-number-partition-by-performance