当涉及范围时，索引中的第一个基数较高的列？

ttygqcqt 于 2021-06-25 发布在 Mysql

关注(0)|答案(1)|浏览(224)

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

表中有一百万行。文件时间基本上是不同的。数量有限的 ext 价值观。所以， filetime 有很高的基数和 ext 基数要低得多。
查询涉及两个方面 ext 以及 filetime :

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...

这两个指标哪一个更好？为什么？

mysql performance mariadb query-optimization indexing

来源：https://stackoverflow.com/questions/50239658/higher-cardinality-column-first-in-an-index-when-involving-a-range

1条答案

按热度按时间

eufgjt7s1#

首先，让我们试试 FORCE INDEX 选择其中一个 ef 或者 fe . 时间太短，无法清楚地了解哪个更快，但“解释”显示了不同：
强制范围打开 filetime 首先(注：订单 WHERE 没有影响。）

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

强制低基数 ext 第一：

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

显然 rows 说 ef 这样更好。但是让我们检查一下优化器跟踪。产量相当庞大；我只展示有趣的部分。不 FORCE 是需要的；跟踪将显示两个选项，然后选择更好的。

...
             "potential_range_indices": [
                ...
                {
                  "index": "fe",
                  "usable": true,
                  "key_parts": [
                    "filetime",
                    "ext",
                    "did",
                    "filename"
                  ]
                },
                {
                  "index": "ef",
                  "usable": true,
                  "key_parts": [
                    "ext",
                    "filetime",
                    "did",
                    "filename"
                  ]
                }
              ],

...

"analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "fe",
                    "ranges": [
                      "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 16684,
                    "cost": 20022,               <-- Here's the critical number
                    "chosen": true
                  },
                  {
                    "index": "ef",
                    "ranges": [
                      "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 538,
                    "cost": 646.61,               <-- Here's the critical number
                    "chosen": true
                  }
                ],

...

"attached_conditions_computation": [
            {
              "access_type_changed": {
                "table": "`files`",
                "index": "ef",
                "old_type": "ref",
                "new_type": "range",
                "cause": "uses_more_keyparts"   <-- Also interesting
              }
            }

与 fe （首先是range列），可以使用range，但它估计扫描16684行以获取 ext='gif' .
与 ef （低基数） ext 首先），它可以使用索引的两列，并在btree中更有效地向下钻取。然后它发现了大约538行，所有这些行都对查询有用——不需要进一步过滤。
结论： INDEX(filetime, ext) 仅使用第一列。 INDEX(ext, filetime) 使用了两列。
将涉及的列放入 = 首先在索引中进行测试，而不考虑基数。
查询计划不会超出第一个“range”列。
“基数”与复合索引和此类查询无关。
（“使用索引条件”意味着存储引擎（innodb）将使用除用于过滤的索引列之外的索引列。）

赞(0）回复(0）举报 2021-06-25

我来回答

当涉及范围时，索引中的第一个基数较高的列？

1条答案

相关问题

热门标签

最新问答