我有一份工厂员工轮班工作的申请。我有一个 SQL 查询来查明班次是否有冲突:如果你有一个班次 08:00-16:00 而我在同一天分配了另一个班次 09:00-17:00,这是一个冲突,因为你可以'不要同时在两个班次。
这是我的 SQL 查询:
SELECT
`shifts`.`id`
FROM
`shifts`
INNER JOIN `shifts` `shifts_2` ON `shifts_2`.`employee_id` = `shifts`.`employee_id`
AND `shifts_2`.`start_at` < '2023-03-01 00:00:00'
AND `shifts_2`.`start_at` < `shifts`.`end_at`
AND `shifts_2`.`end_at` > '2023-01-31 23:59:00'
AND `shifts_2`.`end_at` > `shifts`.`start_at`
AND `shifts_2`.`id` != `shifts`.`id`
WHERE
`shifts`.`id` IN (22258796, 22258797);
为了便于阅读,我在最后一行简化了班次 ID 的数量,但是由于这个列表是动态的,我在那里看到了使用 6k ID 的查询。发生这种情况时,此查询会扫描数百万行,因此仅需 10 秒以上即可返回数据。
这是我使用的表的结构:
CREATE TABLE `shifts` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`employee_id` int(11) NOT NULL,
`start_at` datetime NOT NULL,
`end_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index_shifts_on_employee_id` (`employee_id`),
KEY `index_sm_shifts_on_employee_id_and_start_at_and_end_at` (`employee_id`,`start_at`,`end_at`),
KEY `index_sm_employee_id_id_start_end` (`employee_id`,`id`,`start_at`,`end_at`),
CONSTRAINT `fk_03a7d0ca25` FOREIGN KEY (`employee_id`) REFERENCES `employees` (`id`) ON DELETE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=32677939 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
我正在使用 MySQL 5.7.
如果我为查询运行
EXPLAIN FORMAT=JSON
,这就是我得到的:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "62.36"
},
"nested_loop": [
{
"table": {
"table_name": "shifts",
"access_type": "range",
"possible_keys": [
"PRIMARY",
"index_shifts_on_employee_id",
"index_sm_shifts_on_employee_id_and_start_at_and_end_at",
"index_sm_employee_id_id_start_end"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "8",
"rows_examined_per_scan": 2,
"rows_produced_per_join": 2,
"filtered": "100.00",
"cost_info": {
"read_cost": "2.41",
"eval_cost": "0.40",
"prefix_cost": "2.81",
"data_read_per_join": "2K"
},
"used_columns": [
"id",
"employee_id",
"start_at",
"end_at"
],
"attached_condition": "(`shifts`.`id` in (22258796,22258797))"
}
},
{
"table": {
"table_name": "shifts_2",
"access_type": "ref",
"possible_keys": [
"index_shifts_on_employee_id",
"index_sm_shifts_on_employee_id_and_start_at_and_end_at",
"index_sm_employee_id_id_start_end"
],
"key": "index_sm_employee_id_id_start_end",
"used_key_parts": [
"employee_id"
],
"key_length": "4",
"ref": [
"shifts.employee_id"
],
"rows_examined_per_scan": 141,
"rows_produced_per_join": 3,
"filtered": "1.11",
"using_index": true,
"cost_info": {
"read_cost": "3.02",
"eval_cost": "0.63",
"prefix_cost": "62.36",
"data_read_per_join": "3K"
},
"used_columns": [
"id",
"employee_id",
"start_at",
"end_at"
],
"attached_condition": "((`shifts_2`.`start_at` < '2023-03-01 00:00:00') and (`shifts_2`.`start_at` < `shifts`.`end_at`) and (`shifts_2`.`end_at` > '2023-01-31 23:59:00') and (`shifts_2`.`end_at` > `shifts`.`start_at`) and (`shifts_2`.`id` <> `shifts`.`id`))"
}
}
]
}
}
我怎样才能改进这个以避免扫描这么多行?
谢谢!
编辑:我需要确定哪些班次有冲突,以便用户可以删除它们,这就是我需要返回 ID 的原因。
您可以在子查询中计数,然后加入表以获取详细信息
drop table if exists t;
create table t
(id int auto_increment primary key,
eid int,
stdt datetime,
enddt datetime);
insert into t(eid,stdt,enddt) values
(1,'2022-02-28 08:00:00','2022-02-28 16:00:00'),
(1,'2022-02-28 09:00:00','2022-02-28 17:00:00'),
(2,'2022-02-28 08:00:00','2022-02-28 16:00:00');
select t.*
from t
join
(
select eid,date(stdt) dt,count(*) cnt
from t
group by eid having cnt > 1
) cte on cte.eid = t.eid and dt = date(stdt);
+----+------+---------------------+---------------------+
| id | eid | stdt | enddt |
+----+------+---------------------+---------------------+
| 1 | 1 | 2022-02-28 08:00:00 | 2022-02-28 16:00:00 |
| 2 | 1 | 2022-02-28 09:00:00 | 2022-02-28 17:00:00 |
+----+------+---------------------+---------------------+
2 rows in set (0.001 sec)