incubator-doris [Enhancement] [Vectorized] optimize pre-agg in stream load memtable

cdmah0mi  于 2022-04-22  发布在  Java
关注(0)|答案(0)|浏览(114)

Search before asking

  • I had searched in the issues and found no similar issues.

Description

The current version( #8561 ) of vectorized stream load is using skip list to aggregate values, which is a row-structured constant ordered data result.

Considering that we just do pre-aggregation in memtable and queries will not go to memtable, this implementation of maintaining constant order is too expensive.

I plan to refactor this part of the code and replace the existing skip list with other solutions based on pr #8561.

Solution

Solution 1:
First, sort of incoming-data-block.
Then, merge the sorted-data-block.
Then, append merged-data-block to final-block.
At last, do a finalize(sort + merge) of the final-block to flush.

Solution 2:
First, aggregate the incoming-data-block by hash table.
At last, sort the whole aggregated-block.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题