clickhouse数据库嵌套结构中的主键

t5zmwmid 于 2021-07-15 发布在 ClickHouse

关注(0)|答案(1)|浏览(408)

在clickhouse中，我创建了一个嵌套结构的表

CREATE TABLE IF NOT EXISTS table_name (
    timestamp Date,
    str_1 String,
    Nested_structure Nested (
        index_array UInt32,
        metric_2 UInt64,
        metric_3 UInt8
    ),
    sign Int8 DEFAULT 1
) ENGINE = CollapsingMergeTree(sign) PARTITION BY (toYYYYMM(timestamp)) ORDER BY (timestamp, str_1)

我将提出的问题如下：

SELECT count(*) AS count FROM table_name
 WHERE (timestamp = '2017-09-01')
 AND
 arrayFirst((i, x) -> x = 7151, Nested_structure.metric_2, Nested_structure.index_array) > 50000

我想计算str_1，其中：索引中与值7151的索引_数组匹配的（数组）列度量_2的值大于给定的阈值（50000）
我想知道是否有可能为列设置一个主键：index\u array以加快查询速度。
如果在order by子句中添加column:nested\u structure.index\u array，则假定它是大表的数组列，而不是嵌套\u结构的列索引\u数组的单个值
例如 `ORDER BY (timestamp, str_1, Nested_structure.index_array)` 算法是：
在索引数组中搜索给定值的索引
使用步骤（1）中的索引，从其他数组中检索值
如果索引数组被排序并且表知道这一点，那么步骤（1）可能会更快（例如使用二进制搜索算法）
有人有主意吗？

编辑

列基数：str\u 1 150000000个不同值索引\u数组：15000-20000个不同值
假设索引\u数组distinct值为：列\u 1，…，列\u 15000，则非规范化表应具有以下结构：

timestamp,
str_1,
column_1a, <--  store values for metric_2
...
column_15000a, <--  store values for metric_2
column_1b, <--  store values for metric_3
...
column_15000b, <--  store values for metric_3

@如果我使用lowcardinality类型的列，您能告诉我这个表的结构吗？

clickhouse primary-key Nested

来源：https://stackoverflow.com/questions/53838162/primary-key-in-nested-structure-on-clickhouse-database

1条答案

按热度按时间

tkclm6bt1#

我想知道是否有可能为列设置一个主键：index\u array以加快查询速度。
不，clickhouse没有数组索引。如果你提供 Nested_structure.index_array 作为第三个论点 order by 子句，它将考虑数组列对整行进行排序。注意， [1,2] < [1,2,3] .
您只需在不使用嵌套列的情况下对表进行反规范化，并使前两列的类型为 LowCardinality 几乎可以生产了。

更新

看来你不会从中得到什么好处 LowCardinality 类型。我的意思是这样做

CREATE TABLE IF NOT EXISTS table_name (
    timestamp Date,
    str_1 String,
    index_array UInt32,
    metric_2 UInt64,
    metric_3 UInt8,
    sign Int8 DEFAULT 1
) ENGINE = CollapsingMergeTree(sign) PARTITION BY (toYYYYMM(timestamp)) ORDER BY (timestamp, str_1, index_array)

通过这样做，您仍然可以使用旧的插入逻辑

CREATE TABLE IF NOT EXISTS table_name ( timestamp Date, str_1 String, index_array UInt32, metric_2 UInt64, metric_3 UInt8, sign Int8 DEFAULT 1 ) ENGINE = CollapsingMergeTree(sign) PARTITION BY (toYYYYMM(timestamp)) ORDER BY (timestamp, str_1, index_array)

CREATE TABLE IF NOT EXISTS source_table ( timestamp Date, str_1 String, Nested_structure Nested ( index_array UInt32, metric_2 UInt64, metric_3 UInt8 ), sign Int8 DEFAULT 1 ) ENGINE Null;

create materialized view data_pipe to table_name as select timestamp, str_1, Nested_structure.index_array index_array, Nested_structure.metric_2 metric_2, Nested_structure.metric_3 metric_3, sign from source_table array join Nested_structure;

insert into source_table values (today(), 'fff', [1,2,3], [2,3,4], [3,4,5], 1);

赞(0）回复(0）举报 2021-07-15

我来回答

clickhouse数据库嵌套结构中的主键

编辑

1条答案

更新

相关问题

热门标签

最新问答