incubator-doris [Enhancement] Push predicate to subquery

fsi0uk1n  于 2022-04-22  发布在  Java
关注(0)|答案(0)|浏览(130)

Search before asking

  • I had searched in the issues and found no similar issues.

Description

TPCH q17:
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);

If we rewrite the query as follows, the scan rows of lineitem in subquery will much less than old q17
as well as the less row count for group node.

TPCH Q17(rewrite):
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part p1
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem,
part p2
where
l_partkey = p1.p_partkey
and l_partkey = p2.p_partkey
and p2.p_brand = 'Brand#23'
and p2.p_container = 'MED BOX'
);

Although we might push some predicate to subquery to minimize the original scan row count, it will also introduce a new table and a new join node, which results more rows to scan, filter and join. In TPCH Q2 and Q17, the pushed table is relatively small, so get better performance after pushing. But normally, a CBO optimizer is needed to decide if the pushing is worth. At last, if table spool optimizer is implemented, we can always push the predicates.

Solution

  1. Push the predicate to subquery in some case.
  2. Add a session variable to control if enable this behavior.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题