我对 MySQL 完全陌生,非常感谢您的耐心。 我有以下 MySQL 表:
事实ID | 来自_id | 关系_id | to_id | 链接时间 |
---|---|---|---|---|
1 | 19 | 6 | 151 | 1 |
2 | 2233 | 2 | 57 | 1 |
3 | 182 | 23 | 112 | 1 |
4 | 22 | 17 | 21 | 1 |
5 | 3 | 8 | 742 | 1 |
6 | 507 | 2 | 55 | 1 |
7 | 154 | 25 | 56 | 1 |
8 | 100 | 83 | 18 | 1 |
9 | 1110 | 2 | 31 | 1 |
10 | 141 | 29 | 7 | 1 |
... | ... | ... | ... | ... |
对应的初始化代码:
create table icews14s
(
fact_id int auto_increment
primary key,
from_id int not null,
relation_id int not null,
to_id int not null,
link_time int not null
);
create index icews14s_from_id_index
on icews14s (from_id);
create index icews14s_link_time_index
on icews14s (link_time);
create index icews14s_to_id_index
on icews14s (to_id);
还有一个长查询:
with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
pre_nodes_1 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)),
pre_nodes_2 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
2 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_1)),
pre_nodes_3 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
3 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_2))
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
union select * from pre_nodes_1
union select * from pre_nodes_2
union select * from pre_nodes_3
order by fact_id desc limit 30;
对应的结果是:
事实ID | 来自_id | 关系_id | to_id | 链接时间 | 度 |
---|---|---|---|---|---|
69298 | 1659 | 16 | 269 | 285 | 0 |
60977 | 1659 | 37 | 3176 | 253 | 1 |
58981 | 3176 | 1 | 3281 | 245 | 2 |
58757 | 1659 | 8 | 1884 | 245 | 1 |
58722 | 1659 | 0 | 1884 | 245 | 1 |
39282 | 1659 | 1 | 105 | 163 | 1 |
38740 | 105 | 7 | 1143 | 161 | 2 |
38570 | 105 | 29 | 1815 | 161 | 2 |
38440 | 105 | 19 | 2 | 160 | 2 |
38101 | 2 | 28 | 52 | 159 | 3 |
38061 | 105 | 0 | 581 | 158 | 2 |
37825 | 2 | 14 | 1057 | 157 | 3 |
37822 | 2 | 2 | 228 | 157 | 3 |
37606 | 2 | 2 | 1006 | 156 | 3 |
37597 | 2 | 9 | 9 | 156 | 3 |
37554 | 2 | 2 | 9 | 156 | 3 |
37390 | 2 | 99 | 9 | 156 | 3 |
37322 | 2 | 8 | 48 | 155 | 3 |
37277 | 2 | 2 | 9 | 155 | 3 |
37266 | 2 | 9 | 1068 | 155 | 3 |
37210 | 2 | 8 | 239 | 155 | 3 |
37120 | 2 | 2 | 90 | 155 | 3 |
37032 | 2 | 2 | 9 | 154 | 3 |
36993 | 2 | 8 | 28 | 154 | 3 |
36988 | 2 | 71 | 136 | 154 | 3 |
36971 | 2 | 2 | 90 | 154 | 3 |
36949 | 2 | 9 | 48 | 154 | 3 |
36896 | 2 | 29 | 9 | 154 | 3 |
36827 | 2 | 28 | 309 | 154 | 3 |
36798 | 2 | 10 | 52 | 153 | 3 |
问题是,在这样一个10万条记录的表中,这个查询非常慢。
有什么方法可以加快速度吗?
我尝试使用递归临时表来解决,但是这个语句永远不会结束:
WITH RECURSIVE pre_nodes AS (
SELECT
fact_id,
from_id,
relation_id,
to_id,
link_time,
0 AS degree
FROM
icews14s
WHERE
fact_id = 69298
UNION
SELECT
o.fact_id,
o.from_id,
o.relation_id,
o.to_id,
o.link_time,
n.degree + 1
FROM
icews14s o
JOIN
pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
(o.from_id = n.to_id AND o.link_time < n.link_time))
WHERE
o.fact_id != 69298
)
SELECT distinct
fact_id,
from_id,
relation_id,
to_id,
link_time,
degree
FROM
pre_nodes
ORDER BY
fact_id DESC
LIMIT
30;
您的第一个查询可以大大简化。我们以
pre_nodes_1
cte 为例:
pre_nodes_1 as (
select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id
and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)
)
作为O。 Jones在评论中指出,
(select * from icews14s)
的嵌套是不必要的,但是优化器应该发现这一点并删除嵌套。由于 left join
的标准,inner join
将隐式转换为 t.fact_id
。鉴于 t.fact_id in (select fact_id from target)
,我认为 cte 可以重写为:
pre_nodes_1 as (
select o.*, 1 as degree
from icews14s o
join target t
on o.from_id = t.from_id and o.link_time < t.link_time
)
同样的模式也适用于其他两个
pre_nodes_*
ctes。考虑到每个连接中的 o.link_time < t.link_time
标准,无需对 UNION
进行重复数据删除,因此切换到 UNION ALL
将通过删除重复数据删除步骤来减少开销。
with target as (
select fact_id, from_id, link_time from icews14s where fact_id = 69298
),
pre_nodes_1 as (
select o.*, 1 as degree
from icews14s o
join target t
on o.from_id = t.from_id and o.link_time < t.link_time
),
pre_nodes_2 as (
select o.*, 2 as degree
from icews14s o
join pre_nodes_1 t
on o.from_id = t.to_id and o.link_time < t.link_time
),
pre_nodes_3 as (
select o.*, 3 as degree
from icews14s o
join pre_nodes_2 t
on o.from_id = t.to_id and o.link_time < t.link_time
)
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree
from icews14s
where fact_id = 69298
union all
select * from pre_nodes_1
union all
select * from pre_nodes_2
union all
select * from pre_nodes_3
order by fact_id desc
limit 30;
此查询可能会受益于
(from_id, link_time)
上的复合索引:
alter table icews14s
add index idx_from_id_link_time (from_id, link_time);
对于您的递归 cte,我怀疑性能问题是由于连接条件造成的。如果没有合理数量的测试数据,我无法对此进行测试,但我怀疑像这样重写 cte 的递归部分可能允许它使用索引:
SELECT o.*, n.degree + 1
FROM pre_nodes n
JOIN icews14s o
ON o.from_id = IF(n.degree = 0, n.from_id, n.to_id)
AND o.link_time < n.link_time
AND o.fact_id <> 69298