带自连接的 MySQL 查询扫描数百万行

问题描述 投票:0回答:1

我有一份工厂员工轮班工作的申请。我有一个 SQL 查询来查明班次是否有冲突:如果你有一个班次 08:00-16:00 而我在同一天分配了另一个班次 09:00-17:00,这是一个冲突,因为你可以'不要同时在两个班次。

这是我的 SQL 查询:

SELECT 
  `shifts`.`id` 
FROM 
  `shifts` 
  INNER JOIN `shifts` `shifts_2` ON `shifts_2`.`employee_id` = `shifts`.`employee_id` 
  AND `shifts_2`.`start_at` < '2023-03-01 00:00:00' 
  AND `shifts_2`.`start_at` < `shifts`.`end_at` 
  AND `shifts_2`.`end_at` > '2023-01-31 23:59:00' 
  AND `shifts_2`.`end_at` > `shifts`.`start_at` 
  AND `shifts_2`.`id` != `shifts`.`id` 
WHERE 
  `shifts`.`id` IN (22258796, 22258797);

为了便于阅读,我在最后一行简化了班次 ID 的数量,但是由于这个列表是动态的,我在那里看到了使用 6k ID 的查询。发生这种情况时,此查询会扫描数百万行,因此仅需 10 秒以上即可返回数据。

这是我使用的表的结构:

CREATE TABLE `shifts` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `employee_id` int(11) NOT NULL,
  `start_at` datetime NOT NULL,
  `end_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_shifts_on_employee_id` (`employee_id`),
  KEY `index_sm_shifts_on_employee_id_and_start_at_and_end_at` (`employee_id`,`start_at`,`end_at`),
  KEY `index_sm_employee_id_id_start_end` (`employee_id`,`id`,`start_at`,`end_at`),
  CONSTRAINT `fk_03a7d0ca25` FOREIGN KEY (`employee_id`) REFERENCES `employees` (`id`) ON DELETE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=32677939 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

我正在使用 MySQL 5.7.

如果我为查询运行

EXPLAIN FORMAT=JSON
,这就是我得到的:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "62.36"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "shifts",
          "access_type": "range",
          "possible_keys": [
            "PRIMARY",
            "index_shifts_on_employee_id",
            "index_sm_shifts_on_employee_id_and_start_at_and_end_at",
            "index_sm_employee_id_id_start_end"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "8",
          "rows_examined_per_scan": 2,
          "rows_produced_per_join": 2,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "2.41",
            "eval_cost": "0.40",
            "prefix_cost": "2.81",
            "data_read_per_join": "2K"
          },
          "used_columns": [
            "id",
            "employee_id",
            "start_at",
            "end_at"
          ],
          "attached_condition": "(`shifts`.`id` in (22258796,22258797))"
        }
      },
      {
        "table": {
          "table_name": "shifts_2",
          "access_type": "ref",
          "possible_keys": [
            "index_shifts_on_employee_id",
            "index_sm_shifts_on_employee_id_and_start_at_and_end_at",
            "index_sm_employee_id_id_start_end"
          ],
          "key": "index_sm_employee_id_id_start_end",
          "used_key_parts": [
            "employee_id"
          ],
          "key_length": "4",
          "ref": [
            "shifts.employee_id"
          ],
          "rows_examined_per_scan": 141,
          "rows_produced_per_join": 3,
          "filtered": "1.11",
          "using_index": true,
          "cost_info": {
            "read_cost": "3.02",
            "eval_cost": "0.63",
            "prefix_cost": "62.36",
            "data_read_per_join": "3K"
          },
          "used_columns": [
            "id",
            "employee_id",
            "start_at",
            "end_at"
          ],
          "attached_condition": "((`shifts_2`.`start_at` < '2023-03-01 00:00:00') and (`shifts_2`.`start_at` < `shifts`.`end_at`) and (`shifts_2`.`end_at` > '2023-01-31 23:59:00') and (`shifts_2`.`end_at` > `shifts`.`start_at`) and (`shifts_2`.`id` <> `shifts`.`id`))"
        }
      }
    ]
  }
}

我怎样才能改进这个以避免扫描这么多行?

谢谢!

编辑:我需要确定哪些班次有冲突,以便用户可以删除它们,这就是我需要返回 ID 的原因。

mysql performance query-optimization self-join
1个回答
0
投票

您可以在子查询中计数,然后加入表以获取详细信息

drop table if exists t;

create table t
(id int auto_increment primary key,
eid int,
stdt datetime,
enddt datetime);

insert into t(eid,stdt,enddt) values
(1,'2022-02-28 08:00:00','2022-02-28 16:00:00'),
(1,'2022-02-28 09:00:00','2022-02-28 17:00:00'),
(2,'2022-02-28 08:00:00','2022-02-28 16:00:00');

 select t.*
from t
join 
(
select eid,date(stdt) dt,count(*) cnt
from t
group by eid having cnt > 1
) cte on cte.eid = t.eid and dt = date(stdt);

+----+------+---------------------+---------------------+
| id | eid  | stdt                | enddt               |
+----+------+---------------------+---------------------+
|  1 |    1 | 2022-02-28 08:00:00 | 2022-02-28 16:00:00 |
|  2 |    1 | 2022-02-28 09:00:00 | 2022-02-28 17:00:00 |
+----+------+---------------------+---------------------+
2 rows in set (0.001 sec)
© www.soinside.com 2019 - 2024. All rights reserved.