当我通过beeline运行以下查询时,WHERE子句中的一个筛选条件(array_item in (20, 30)
)似乎无法正常工作。
- 配置单元版本:
2.1.1
with sample as (
select 1 as col, array(10) as array_col
union all select 2 as col, array(10, 20, 30) as array_col
union all select 3 as col, array(20, 30) as array_col
union all select 4 as col, array(30) as array_col
)
, targets as (
select 2 as col_target
)
select
col
, array_col
, array_item
from
sample
lateral view explode(array_col) a as array_item
where
true
and array_item in (20, 30)
and sample.col in (select col_target from targets)
虽然我希望显示以下结果,
col array_col array_item
2 [10,20,30] 20
2 [10,20,30] 30
查询返回的实际结果如下所示,表明查询未能应用条件array_item in (20, 30)
col array_col array_item
2 [10,20,30] 10
2 [10,20,30] 20
2 [10,20,30] 30
为了验证,我还运行了以下查询。
这似乎被正确地执行了,并且返回了与我预期的相同的结果。
with sample as (
select 1 as col, array(10) as array_col
union all select 2 as col, array(10, 20, 30) as array_col
union all select 3 as col, array(20, 30) as array_col
union all select 4 as col, array(30) as array_col
)
, targets as (
select 2 as col_target
)
select
col
, array_col
, array_item
from
sample
lateral view explode(array_col) a as array_item
where
true
and array_item in (20, 30)
and sample.col in (2) -- changed here: "2" is same as the result of subquery
第一个和第二个查询的区别在于我是在WHERE子句中编写子查询还是子查询返回的结果。
所以我猜这两个查询在逻辑上是相同的,但是它们返回不同的结果。
我不知道为什么第一个查询不考虑array_item in (20, 30)
。
1条答案
按热度按时间m0rkklqb1#
我使用hadoop 3.3.1和tez 0.10.2在Hive4 Alpha2中进行了一些测试
当hive.optimize.ppd对于第一个查询为true时,看起来像是一些bug。
通过设置
hive.optimize.ppd=false
,第一次查询也与第二个版本相同。