HiveQL查询中WHERE子句的筛选条件无法正常工作

pgpifvop  于 2023-01-09  发布在  Hive
关注(0)|答案(1)|浏览(228)

当我通过beeline运行以下查询时,WHERE子句中的一个筛选条件(array_item in (20, 30))似乎无法正常工作。

  • 配置单元版本:2.1.1
with sample as (
              select 1 as col, array(10)         as array_col
    union all select 2 as col, array(10, 20, 30) as array_col
    union all select 3 as col, array(20, 30)     as array_col
    union all select 4 as col, array(30)         as array_col
)

, targets as (
    select 2 as col_target
)

select
    col
    , array_col
    , array_item
from
    sample
    lateral view explode(array_col) a as array_item
where
    true
    and array_item in (20, 30)
    and sample.col in (select col_target from targets)

虽然我希望显示以下结果,

col array_col   array_item
2   [10,20,30]  20
2   [10,20,30]  30

查询返回的实际结果如下所示,表明查询未能应用条件array_item in (20, 30)

col array_col   array_item
2   [10,20,30]  10
2   [10,20,30]  20
2   [10,20,30]  30

为了验证,我还运行了以下查询。
这似乎被正确地执行了,并且返回了与我预期的相同的结果。

with sample as (
              select 1 as col, array(10)         as array_col
    union all select 2 as col, array(10, 20, 30) as array_col
    union all select 3 as col, array(20, 30)     as array_col
    union all select 4 as col, array(30)         as array_col
)

, targets as (
    select 2 as col_target
)

select
    col
    , array_col
    , array_item
from
    sample
    lateral view explode(array_col) a as array_item
where
    true
    and array_item in (20, 30)
    and sample.col in (2)  -- changed here: "2" is same as the result of subquery

第一个和第二个查询的区别在于我是在WHERE子句中编写子查询还是子查询返回的结果。
所以我猜这两个查询在逻辑上是相同的,但是它们返回不同的结果。
我不知道为什么第一个查询不考虑array_item in (20, 30)

m0rkklqb

m0rkklqb1#

我使用hadoop 3.3.1和tez 0.10.2在Hive4 Alpha2中进行了一些测试
当hive.optimize.ppd对于第一个查询为true时,看起来像是一些bug。
通过设置hive.optimize.ppd=false,第一次查询也与第二个版本相同。

相关问题