我有一个使用以下模式的数据库：

CREATE TABLE IF NOT EXISTS `sessions` (
  `starttime` datetime NOT NULL,
  `ip` varchar(15) NOT NULL default '',
  `country_name` varchar(45) default '',
  `country_iso_code` varchar(2) default '',
  `org` varchar(128) default '',
  KEY (`ip`),
  KEY (`starttime`),
  KEY (`country_name`)
);

（实际表包含更多列；我只包含了我查询的列。）引擎是innodb。
正如你所看到的，有3个索引 ip , starttime ，和 country_name .
这个表非常大-它包含大约150万行。我正在对它运行各种查询，试图提取一个月的信息（在下面的示例中是2018年8月）。
像这样的问题

SELECT
  UNIX_TIMESTAMP(starttime) as time_sec,
  country_iso_code AS metric,
  COUNT(country_iso_code) AS value
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY metric;

速度相当慢，但可以承受（几十秒），尽管没有索引 country_iso_code .
（忽略第一件事） SELECT ; 我知道这似乎没有意义，但它是使用查询结果的工具所必需的。同样，忽略 FROM_UNIXTIME() 而不是日期字符串；这部分查询是自动生成的，我无法控制它。）
但是，像这样的查询

SELECT
  country_name AS Country,
  COUNT(country_name) AS Attacks
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY Country;

速度慢得让人无法忍受——我让它运行了大约半个小时，然后放弃了，没有得到任何结果。
结果来自 EXPLAIN :

+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys                      | key          | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,starttime_2,country_name | country_name | 138     | NULL | 14771687 |    35.81 | Using where |
+----+-------------+----------+------------+-------+------------------------------------+--------------+---------+------+----------+----------+-------------+

到底是什么问题？我应该索引其他内容吗？或许是一个综合指数( starttime , country_name )? 我读过这本指南，但也许我误解了？
下面是一些同样缓慢的查询，可能也遇到同样的问题：
查询#2：

SELECT
  ip AS IP,
  COUNT(ip) AS Attacks
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY ip;

结果来自 EXPLAIN :

+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys            | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,ip,starttime_2 | ip   | 47      | NULL | 14771780 |    35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+

查询#3：

SELECT
  org AS Organization,
  COUNT(org) AS Attacks
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY Organization;

结果来自 EXPLAIN :

+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys             | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,starttime_2,org | org  | 387     | NULL | 14771800 |    35.81 | Using where |
+----+-------------+----------+------------+-------+---------------------------+------+---------+------+----------+----------+-------------+

查询#4：

SELECT
  ip AS IP,
  country_name AS Country,
  city_name AS City,
  org AS Organization,
  COUNT(ip) AS Attacks
FROM
  sessions
WHERE
  starttime >= FROM_UNIXTIME(1533070800) AND
  starttime <= FROM_UNIXTIME(1535749199)
GROUP BY ip;

结果来自 EXPLAIN :

+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys            | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | sessions | NULL       | index | starttime,ip,starttime_2 | ip   | 47      | NULL | 14771914 |    35.81 | Using where |
+----+-------------+----------+------------+-------+--------------------------+------+---------+------+----------+----------+-------------+

更好的是。。。
请注意，您没有 PRIMARY KEY ; 那太淘气了。pk并不能从本质上提高性能，但是pk是从 starttime 威尔。让我们这样做：

CREATE TABLE IF NOT EXISTS `sessions` (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,   -- note
  `starttime` datetime NOT NULL,
  `ip` varchar(39) NOT NULL CHARACTER SET ascii default '',  -- note
  `country_name` varchar(45) default '',
  `country_iso_code` char(2) CHARACTER SET ascii  default '',  -- note
  `org` varchar(128) default '',
  PRIMARY KEY(starttime, id)  -- in this order
  INDEX(id)                   -- to keep AUTO_INCREMENT happy
  -- The rest are unnecessary for the queries in question:
  KEY (`ip`),
  KEY (`starttime`),
  KEY (`country_name`)
) ENGINE=InnoDB;        -- just in case you are accidentally getting MyISAM

为什么？这将利用pk与数据的“聚集”。这样，只扫描表中时间范围内的一部分。而且索引和数据之间不会有跳跃。而且您不需要很多索引来有效地处理所有情况。
ipv6最多需要39个字节。请注意 VARCHAR 不会让你做任何范围（cdr）测试。我可以进一步讨论你喜欢的。

2条答案

按热度按时间

enyaitl31#

赞(0）回复(0）举报 2021-06-20

mrphzbgm2#

一般来说，查询

SELECT column, COUNT(column)
    FROM tbl
   WHERE datestamp >= a AND datestamp <= b
   GROUP BY column

当表上有复合索引时，性能最佳 (datestamp, column) . 为什么？它们可以通过索引扫描来满足，而不需要读取表中的所有行。
换句话说，可以通过随机访问索引（到日期戳的第一个值）来定位查询的第一个相关行。然后，mysql可以按顺序读取索引，并对索引中的各种值进行计数 column 直到它到达最后一行。不需要读取实际的表；仅从索引即可满足查询。这使它更快。

UPDATE TABLE tbl ADD INDEX date_col (datestamp, column);

为您创建索引。
小心两件事。第一：单列索引不一定有助于聚合查询性能。
第二：在没有看到整个查询的情况下，很难猜出用于进行索引扫描的正确索引。简化的查询通常会导致索引过于简化。

使用groupby的mysql查询速度非常慢

2条答案

相关问题

热门标签

最新问答