配置单元分区查询正在扫描所有分区

sqougxex 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(273)

当我像下面这样编写配置单元查询时

select count(*)
from order
where order_month >= '2016-11';

阶段1的hadoop作业信息：Map器数量：5；减速器数量：1
我只得到5个Map器，这意味着只读取所需的分区（2016-11和2016-12）
与我使用函数编写的查询相同

select count(*)
from order
where order_month >= concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)));

注：
concat（年（date \u sub（to \u date（from \u unixtime（unix \u timestamp（））），10）），'-'，月（date \u sub（to \u date（from \u unixtime（unix \u timestamp（））），10））='2016-11'
阶段1的hadoop作业信息：Map器数量：216；减速器数量：1
这次它读取所有分区{2004-10到2016-12}。
如何将查询修改为只读取所需的分区。

hadoop Hive

来源：https://stackoverflow.com/questions/41077242/hive-partition-query-is-scanning-all-partitions

1条答案

按热度按时间

x33g5p2x1#

unix_timestamp() 函数是不确定的，并且会阻止查询的适当优化-这从2.0开始就被弃用，取而代之的是 CURRENT_TIMESTAMP 以及 CURRENT_DATE .
使用当前日期，也不需要分别计算年和月：

where order_month >= substr(date_sub(current_date, 10),1,7)

赞(0）回复(0）举报 2021-05-29

我来回答

配置单元分区查询正在扫描所有分区

1条答案

相关问题

热门标签

最新问答