如何在JOIN查询中正确利用复合索引?

问题描述 投票:0回答:1

我的问题是如何让MySQL优化器以最有效的方式使用复合索引。

我正在使用 MySQL Server 8,并且有一个 MyISAM 表,其中包含 2 年期间“单元”对象的每日统计信息。每天大约有 51.000 到 57.000 个单元格(行)。表中的列很多 - 大约 860 个计数器。数据库无法标准化,因为所有列都同等重要。该查询为一组用户定义的单元格列表生成大约 840 列统计信息。每一列都是一个 KPI,它是根据一个或多个原始计数器计算得出的。该查询将具有簇定义“clusters_cust”的表与主统计表“h_cell”连接起来。簇中的每个单元与表“h_cell”中同一单元的统计记录相匹配。用户定义一个时间段,然后将报告时间段内每一天的每个集群值的结果进行聚合。

查询如下所示:

SELECT cluster,Time,
ROUND(SUM(`counter1`)/SUM(`counter1`+`counter2`)*100,3) AS 'KPI1',
SUM(`counter1`) AS 'KPI2',
.......
SUM(`counterN`) AS 'KPI840'
FROM h_cell
INNER JOIN clusters_cust ON clusters_cust.cell = h_cell.cell
WHERE cluster='cluster62' AND Time>='2018-05-01' AND Time<='2018-06-30'
GROUP BY Time 

编辑: 根据 TheImpaler 的评论:

您正在连接两个表并对它们应用过滤器。优化器不知道是开始访问表 #1 然后扫描表 #2 更好,还是反之亦然。

在问题的末尾,有两个修改后的查询变体,不幸的是,它们要么表现更差,要么表现相同。

表“h_cell”具有以下结构:

mysql> SHOW CREATE TABLE h_cell;
CREATE TABLE `h_cell` (
  `Time` date NOT NULL,
  `Cell` char(8) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  `LocalCI` tinyint NOT NULL,
  `Integrity` varchar(6) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `counter1` int DEFAULT NULL,
  `counter2` int DEFAULT NULL,
  `counter3` double DEFAULT NULL,
  `counter4` float DEFAULT NULL,
  ...........
  `counter860` int DEFAULT NULL,
  PRIMARY KEY (`Cell`,`Time`) USING BTREE,
  KEY `Time` (`Time`,`LocalCI`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC

表“clusters_cust”具有以下结构:

mysql> SHOW CREATE TABLE clusters_cust;
CREATE TABLE `clusters_cust` (
  `Cell` varchar(11) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  `Cluster` varchar(80) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `Comment` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  PRIMARY KEY (`Cluster`,`Cell`),
  KEY `Cell` (`Cell`),
  KEY `Comment` (`Comment`,`Cluster`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

“h_cell”表中的索引:

mysql> SHOW INDEX FROM h_cell;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| h_cell |          0 | PRIMARY  |            1 | Cell        | A         |       58258 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| h_cell |          0 | PRIMARY  |            2 | Time        | A         |    39090988 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| h_cell |          1 | Time     |            1 | Time        | A         |         730 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| h_cell |          1 | Time     |            2 | LocalCI     | A         |       15081 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

主键旨在为单元格级别的查询提供服务 - 它显示给定单元格和时间段的统计信息。我希望它也能帮助集群级别的联接查询,但事实并非如此。

下面是 62 个单元集群和 2 个月报告周期的 EXPLAIN 命令。似乎从复合主键中仅使用了第一个成员“Cell”,而不使用“Time”部分:

+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+
| id | select_type | table         | partitions | type | possible_keys       | key     | key_len | ref                          | rows | filtered | Extra                                     |
+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | clusters_cust | NULL       | ref  | PRIMARY,Cell        | PRIMARY | 322     | const                        |   63 |   100.00 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | h_cell        | NULL       | ref  | PRIMARY,Time        | PRIMARY | 32      | ee_4g_hua.clusters_cust.Cell |  671 |     5.32 | Using index condition                     |
+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+

对于包含 3.000 个单元且报告期为 2 个月的较大集群,情况是相同的 - 同样仅使用第一个成员“单元”:

+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+
| id | select_type | table         | partitions | type | possible_keys       | key     | key_len | ref                          | rows | filtered | Extra                                     |
+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | clusters_cust | NULL       | ref  | PRIMARY,Cell        | PRIMARY | 322     | const                        | 4067 |   100.00 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | h_cell        | NULL       | ref  | PRIMARY,Time        | PRIMARY | 32      | ee_4g_hua.clusters_cust.Cell |  671 |     5.32 | Using index condition                     |
+----+-------------+---------------+------------+------+---------------------+---------+---------+------------------------------+------+----------+-------------------------------------------+

但是对于包含 3.000 个单元格的相同集群和更短的 1 个月报告期,根本不使用主键,而是使用另一个索引“时间”(该索引是为另一种类型的查询而设计的):

+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
| id | select_type | table         | partitions | type   | possible_keys       | key     | key_len | ref                         | rows    | filtered | Extra                    |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | h_cell        | NULL       | range  | PRIMARY,Time        | Time    | 3       | NULL                        | 1056817 |   100.00 | Using index condition    |
|  1 | SIMPLE      | clusters_cust | NULL       | eq_ref | PRIMARY,Cell        | PRIMARY | 368     | const,ee_4g_hua.h_cell.Cell |       1 |   100.00 | Using where; Using index |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+

对于包含 20.000 个单元且报告周期为 2 个月的更大集群,再次不使用主键,而是使用“时间”索引:

+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
| id | select_type | table         | partitions | type   | possible_keys       | key     | key_len | ref                         | rows    | filtered | Extra                    |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | h_cell        | NULL       | range  | PRIMARY,Time        | Time    | 3       | NULL                        | 2080777 |   100.00 | Using index condition    |
|  1 | SIMPLE      | clusters_cust | NULL       | eq_ref | PRIMARY,Cell        | PRIMARY | 368     | const,ee_4g_hua.h_cell.Cell |       1 |   100.00 | Using where; Using index |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+

我想知道在我的查询或表设计中应该更改什么,以便优化器能够使用主键索引的“Cell”和“Time”成员?这是否可能,或者应该有另一个更有效的索引?

编辑:

  • 修改查询 No 1 - 我没有使用 JOIN,而是在
    WHERE
    子句中执行子查询来获取所需的单元格列表,然后在查询结果中使用
    IN
    运算符。
SELECT Time,
ROUND(SUM(`counter1`)/SUM(`counter1`+`counter2`)*100,3) AS 'KPI1',
SUM(`counter1`) AS 'KPI2',
.......
SUM(`counterN`) AS 'KPI840'
FROM h_cell
WHERE cell IN (SELECT cell FROM clusters_cust WHERE cluster='cluster20k') AND Time>='2018-05-01' AND Time<='2018-06-30'
GROUP BY Time 

EXPLAIN
显示相同的计划,就好像有
JOIN
一样,并且再次使用“时间”索引,而不是
PRIMARY KEY

+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
| id | select_type | table         | partitions | type   | possible_keys       | key     | key_len | ref                         | rows    | filtered | Extra                    |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | h_cell        | NULL       | range  | PRIMARY,Time        | Time    | 3       | NULL                        | 2080777 |   100.00 | Using index condition    |
|  1 | SIMPLE      | clusters_cust | NULL       | eq_ref | PRIMARY,Cell        | PRIMARY | 368     | const,ee_4g_hua.h_cell.Cell |       1 |   100.00 | Using where; Using index |
+----+-------------+---------------+------------+--------+---------------------+---------+---------+-----------------------------+---------+----------+--------------------------+
  • 修改查询No 2 - 再次省略
    JOIN
    ,并且有一个子查询直接从预过滤表“clust”中获取所需的单元格列表。再次使用
    IN
    运算符。
SELECT Time,
ROUND(SUM(`counter1`)/SUM(`counter1`+`counter2`)*100,3) AS 'KPI1',
SUM(`counter1`) AS 'KPI2',
.......
SUM(`counterN`) AS 'KPI840'
FROM h_cell
WHERE cell IN (SELECT * FROM clust) and Time>='2018-05-01' and Time<='2018-06-30' 
GROUP BY Time 

EXPLAIN
的结果如下。这次使用了
PRIMARY KEY
索引,但仅使用了其中的第一列 - 'cell' 而不是两列 - 'cell' 和 'Time'。执行时间太短了,以至于我无法等待查询结束。

+----+--------------+-------------+------------+------+---------------------+---------+---------+------------------+-------+----------+------------------------------+
| id | select_type  | table       | partitions | type | possible_keys       | key     | key_len | ref              | rows  | filtered | Extra                        |
+----+--------------+-------------+------------+------+---------------------+---------+---------+------------------+-------+----------+------------------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL                | NULL    | NULL    | NULL             |  NULL |   100.00 | Using where; Using temporary |
|  1 | SIMPLE       | h_cell      | NULL       | ref  | PRIMARY,Time        | PRIMARY | 32      | <subquery2>.cell |   671 |     5.32 | Using index condition        |
|  2 | MATERIALIZED | clust       | NULL       | ALL  | NULL                | NULL    | NULL    | NULL             | 20000 |   100.00 | NULL                         |
+----+--------------+-------------+------------+------+---------------------+---------+---------+------------------+-------+----------+------------------------------+
mysql inner-join
1个回答
0
投票

由于替代方案不起作用,我建议采用这种替代方案。

在您的 h_cell 表上,有一个关于(时间,单元格)的索引。

对于查询,我还添加了关键字STRAIGHT_JOIN。我还使用表(别名)将每一列限定为相应的列,以便更好地跟踪哪列来自哪个表。

现在,甚至从您自己的数据描述来看,大约 60 天的时间内每天有 51-57k 条记录仍在运行并计算超过 300k 条记录。现在,由于您只关心此示例中的“cluster62”,因此我假设计数较少。另外,为了查询清晰,您不必在每列周围添加

tick
字符... table.column 或 alias.column 为引擎提供显式限定,以防止数据来源含糊不清,例如
TIME 
可能是保留字,但
h.time
明确是
h
中的列(h_cell 表的别名)。

SELECT STRAIGHT_JOIN
        cc.cluster,
        h.Time,
        ROUND(  SUM( h.counter1 ) / SUM( h.counter1 + h.counter2 ) * 100,3) AS KPI1,
        SUM( h.counter1 ) AS KPI2,
        .......
        SUM( h.counterN ) AS KPI840
    FROM 
        h_cell h
            INNER JOIN clusters_cust cc
                ON  cc.cluster = 'cluster62' 
                AND h.cell = cc.cell
    WHERE 
            h.Time >= '2018-05-01' 
        AND h.Time <= '2018-06-30'
    GROUP BY 
        h.Time 
© www.soinside.com 2019 - 2024. All rights reserved.