递归子查询的SQL优化查询（MySQL）

Question

我对 MySQL 完全陌生，非常感谢您的耐心。我有以下 MySQL 表：

事实ID	来自_id	关系_id	to_id	链接时间
1	19	6	151	1
2	2233	2	57	1
3	182	23	112	1
4	22	17	21	1
5	3	8	742	1
6	507	2	55	1
7	154	25	56	1
8	100	83	18	1
9	1110	2	31	1
10	141	29	7	1
...	...	...	...	...

对应的初始化代码：

create table icews14s
(
    fact_id     int auto_increment
        primary key,
    from_id     int not null,
    relation_id int not null,
    to_id       int not null,
    link_time   int not null
);

create index icews14s_from_id_index
    on icews14s (from_id);

create index icews14s_link_time_index
    on icews14s (link_time);

create index icews14s_to_id_index
    on icews14s (to_id);

还有一个长查询：

with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
             pre_nodes_1 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    1             as degree
                             from icews14s o
                                      left join (select * from icews14s) t
                                                on o.from_id = t.from_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from target)),
             pre_nodes_2 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    2             as degree
                             from icews14s o
                                      left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from pre_nodes_1)),
             pre_nodes_3 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    3             as degree
                             from icews14s o
                                      left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from pre_nodes_2))
        select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
        union select * from pre_nodes_1
        union select * from pre_nodes_2
        union select * from pre_nodes_3
        order by fact_id desc limit 30;

对应的结果是：

事实ID	来自_id	关系_id	to_id	链接时间	度
69298	1659	16	269	285	0
60977	1659	37	3176	253	1
58981	3176	1	3281	245	2
58757	1659	8	1884	245	1
58722	1659	0	1884	245	1
39282	1659	1	105	163	1
38740	105	7	1143	161	2
38570	105	29	1815	161	2
38440	105	19	2	160	2
38101	2	28	52	159	3
38061	105	0	581	158	2
37825	2	14	1057	157	3
37822	2	2	228	157	3
37606	2	2	1006	156	3
37597	2	9	9	156	3
37554	2	2	9	156	3
37390	2	99	9	156	3
37322	2	8	48	155	3
37277	2	2	9	155	3
37266	2	9	1068	155	3
37210	2	8	239	155	3
37120	2	2	90	155	3
37032	2	2	9	154	3
36993	2	8	28	154	3
36988	2	71	136	154	3
36971	2	2	90	154	3
36949	2	9	48	154	3
36896	2	29	9	154	3
36827	2	28	309	154	3
36798	2	10	52	153	3

问题是，在这样一个10万条记录的表中，这个查询非常慢。

有什么方法可以加快速度吗？

我尝试使用递归临时表来解决，但是这个语句永远不会结束：

WITH RECURSIVE pre_nodes AS (
    SELECT
        fact_id,
        from_id,
        relation_id,
        to_id,
        link_time,
        0 AS degree
    FROM
        icews14s
    WHERE
        fact_id = 69298
    UNION
    SELECT
        o.fact_id,
        o.from_id,
        o.relation_id,
        o.to_id,
        o.link_time,
        n.degree + 1
    FROM
        icews14s o
    JOIN
    pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
                      (o.from_id = n.to_id AND o.link_time < n.link_time))
    WHERE
        o.fact_id != 69298
)
SELECT distinct
    fact_id,
    from_id,
    relation_id,
    to_id,
    link_time,
    degree
FROM
    pre_nodes
ORDER BY
    fact_id DESC
LIMIT
    30;

Answer 1

您的第一个查询可以大大简化。我们以

pre_nodes_1

cte 为例：

pre_nodes_1 as (
    select o.fact_id     as fact_id,
           o.from_id     as from_id,
           o.relation_id as relation_id,
           o.to_id       as to_id,
           o.link_time   as link_time,
           1             as degree
    from icews14s o
    left join (select * from icews14s) t
        on o.from_id = t.from_id
       and o.link_time < t.link_time
    where t.fact_id in (select fact_id from target)
)

作为O。 Jones在评论中指出，

(select * from icews14s)

的嵌套是不必要的，但是优化器应该发现这一点并删除嵌套。由于

left join

的标准，

inner join

将隐式转换为

t.fact_id

。鉴于

t.fact_id in (select fact_id from target)

，我认为 cte 可以重写为：

pre_nodes_1 as (
    select o.*, 1 as degree
    from icews14s o
    join target t
        on o.from_id = t.from_id and o.link_time < t.link_time
)

同样的模式也适用于其他两个

pre_nodes_*

ctes。考虑到每个连接中的

o.link_time < t.link_time

标准，无需对

UNION

进行重复数据删除，因此切换到

UNION ALL

将通过删除重复数据删除步骤来减少开销。

with target as (
    select fact_id, from_id, link_time from icews14s where fact_id = 69298
),
pre_nodes_1 as (
    select o.*, 1 as degree
    from icews14s o
    join target t
        on o.from_id = t.from_id and o.link_time < t.link_time
),
pre_nodes_2 as (
    select o.*, 2 as degree
    from icews14s o
    join pre_nodes_1 t
        on o.from_id = t.to_id and o.link_time < t.link_time
),
pre_nodes_3 as (
    select o.*, 3 as degree
    from icews14s o
    join pre_nodes_2 t
        on o.from_id = t.to_id and o.link_time < t.link_time
)

select fact_id, from_id, relation_id, to_id, link_time, 0 as degree
from icews14s
where fact_id = 69298
union all
select * from pre_nodes_1
union all
select * from pre_nodes_2
union all
select * from pre_nodes_3
order by fact_id desc
limit 30;

此查询可能会受益于

(from_id, link_time)

上的复合索引：

alter table icews14s
    add index idx_from_id_link_time (from_id, link_time);

对于您的递归 cte，我怀疑性能问题是由于连接条件造成的。如果没有合理数量的测试数据，我无法对此进行测试，但我怀疑像这样重写 cte 的递归部分可能允许它使用索引：

    SELECT o.*, n.degree + 1
    FROM pre_nodes n
    JOIN icews14s o
        ON o.from_id = IF(n.degree = 0, n.from_id, n.to_id)
        AND o.link_time < n.link_time
        AND o.fact_id <> 69298

递归子查询的SQL优化查询（MySQL）

问题描述投票：0回答：1

1个回答

最新问题

递归子查询的SQL优化查询（MySQL）

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1