Impala :为什么我不能对一个查询进行两次计数

nhn9ugyo  于 2021-06-26  发布在  Impala
关注(0)|答案(3)|浏览(454)

我尝试对查询中的不同列进行两次计数:

select  count(distinct color) as cid,
  count(distinct entity) as eid from my_table

上述查询无法处理以下错误:

SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: 
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color); deviating function: count(DISTINCT entity)
), Query: select  count(distinct color) as cid,
  count(distinct entity) as eid from my_table

但是,如果我只进行一次计数,查询就会工作。为什么?我能在一次查询中做两次计数吗?谢谢!

bn31dyow

bn31dyow1#

对此的更新-impala 3.1(2018年11月发布)在新的查询块中添加了对多个不同聚合函数的支持。

q9rjltbz

q9rjltbz2#

我不是百分之百肯定这在 Impala 能奏效,但你可以做到 count(distinct) 使用窗口函数和条件聚合。所以,这个查询:

select count(distinct color) as cid,
       count(distinct entity) as eid
from my_table ;

相当于:

select sum(case when seqnum_color = 1 then 1 else 0 end) as cid,
       sum(case when seqnum_entity = 1 then 1 else 0 end) as eid
from (select t.*, 
             row_number() over (partition by color order by color) as seqnum_color,
             row_number() over (partition by entity order by entity) as seqnum_entity
      from my_table t
     ) t;
zqdjd7g9

zqdjd7g93#

impala目前不支持同一查询中的多个count distinct表达式,请参阅impala-110。这是一个需要的特性,但是很难实现,所以还没有添加。
现在,如果不需要精确的精度,可以通过指定 NDV(column) ; 一个查询可以包含多个 NDV(column) . 使 Impala 自动重写 COUNT(DISTINCT) 表达式到 NDV() ,启用 APPX_COUNT_DISTINCT 查询选项(请参阅文档)。

相关问题