incubator-doris [Enhancement] [Vectorized] optimize pre-agg in stream load memtable

cdmah0mi 于 2022-04-22 发布在 Java

关注(0)|答案(0)|浏览(114)

Search before asking

I had searched in the issues and found no similar issues.

Description

The current version( #8561 ) of vectorized stream load is using skip list to aggregate values, which is a row-structured constant ordered data result.

Considering that we just do pre-aggregation in memtable and queries will not go to memtable, this implementation of maintaining constant order is too expensive.

I plan to refactor this part of the code and replace the existing skip list with other solutions based on pr #8561.

Solution

Solution 1:
First, sort of incoming-data-block.
Then, merge the sorted-data-block.
Then, append merged-data-block to final-block.
At last, do a finalize(sort + merge) of the final-block to flush.

Solution 2:
First, aggregate the incoming-data-block by hash table.
At last, sort the whole aggregated-block.

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

来源：https://github.com/apache/incubator-doris/issues/8656

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答