递归子查询的SQL优化查询(MySQL)

问题描述 投票:0回答:1

我对 MySQL 完全陌生,非常感谢您的耐心。 我有以下 MySQL 表:

事实ID 来自_id 关系_id to_id 链接时间
1 19 6 151 1
2 2233 2 57 1
3 182 23 112 1
4 22 17 21 1
5 3 8 742 1
6 507 2 55 1
7 154 25 56 1
8 100 83 18 1
9 1110 2 31 1
10 141 29 7 1
... ... ... ... ...

对应的初始化代码:

create table icews14s
(
    fact_id     int auto_increment
        primary key,
    from_id     int not null,
    relation_id int not null,
    to_id       int not null,
    link_time   int not null
);

create index icews14s_from_id_index
    on icews14s (from_id);

create index icews14s_link_time_index
    on icews14s (link_time);

create index icews14s_to_id_index
    on icews14s (to_id);

还有一个长查询:

with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
             pre_nodes_1 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    1             as degree
                             from icews14s o
                                      left join (select * from icews14s) t
                                                on o.from_id = t.from_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from target)),
             pre_nodes_2 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    2             as degree
                             from icews14s o
                                      left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from pre_nodes_1)),
             pre_nodes_3 as (select o.fact_id     as fact_id,
                                    o.from_id     as from_id,
                                    o.relation_id as relation_id,
                                    o.to_id       as to_id,
                                    o.link_time   as link_time,
                                    3             as degree
                             from icews14s o
                                      left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
                             where t.fact_id in (select fact_id from pre_nodes_2))
        select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
        union select * from pre_nodes_1
        union select * from pre_nodes_2
        union select * from pre_nodes_3
        order by fact_id desc limit 30;

对应的结果是:

事实ID 来自_id 关系_id to_id 链接时间
69298 1659 16 269 285 0
60977 1659 37 3176 253 1
58981 3176 1 3281 245 2
58757 1659 8 1884 245 1
58722 1659 0 1884 245 1
39282 1659 1 105 163 1
38740 105 7 1143 161 2
38570 105 29 1815 161 2
38440 105 19 2 160 2
38101 2 28 52 159 3
38061 105 0 581 158 2
37825 2 14 1057 157 3
37822 2 2 228 157 3
37606 2 2 1006 156 3
37597 2 9 9 156 3
37554 2 2 9 156 3
37390 2 99 9 156 3
37322 2 8 48 155 3
37277 2 2 9 155 3
37266 2 9 1068 155 3
37210 2 8 239 155 3
37120 2 2 90 155 3
37032 2 2 9 154 3
36993 2 8 28 154 3
36988 2 71 136 154 3
36971 2 2 90 154 3
36949 2 9 48 154 3
36896 2 29 9 154 3
36827 2 28 309 154 3
36798 2 10 52 153 3

问题是,在这样一个10万条记录的表中,这个查询非常慢。

有什么方法可以加快速度吗?

我尝试使用递归临时表来解决,但是这个语句永远不会结束:

WITH RECURSIVE pre_nodes AS (
    SELECT
        fact_id,
        from_id,
        relation_id,
        to_id,
        link_time,
        0 AS degree
    FROM
        icews14s
    WHERE
        fact_id = 69298
    UNION
    SELECT
        o.fact_id,
        o.from_id,
        o.relation_id,
        o.to_id,
        o.link_time,
        n.degree + 1
    FROM
        icews14s o
    JOIN
    pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
                      (o.from_id = n.to_id AND o.link_time < n.link_time))
    WHERE
        o.fact_id != 69298
)
SELECT distinct
    fact_id,
    from_id,
    relation_id,
    to_id,
    link_time,
    degree
FROM
    pre_nodes
ORDER BY
    fact_id DESC
LIMIT
    30;
sql mysql query-optimization knowledge-graph
1个回答
0
投票

您的第一个查询可以大大简化。我们以

pre_nodes_1
cte 为例:

pre_nodes_1 as (
    select o.fact_id     as fact_id,
           o.from_id     as from_id,
           o.relation_id as relation_id,
           o.to_id       as to_id,
           o.link_time   as link_time,
           1             as degree
    from icews14s o
    left join (select * from icews14s) t
        on o.from_id = t.from_id
       and o.link_time < t.link_time
    where t.fact_id in (select fact_id from target)
)

作为O。 Jones在评论中指出,

(select * from icews14s)
的嵌套是不必要的,但是优化器应该发现这一点并删除嵌套。由于
left join
的标准,
inner join
将隐式转换为
t.fact_id
。鉴于
t.fact_id in (select fact_id from target)
,我认为 cte 可以重写为:

pre_nodes_1 as (
    select o.*, 1 as degree
    from icews14s o
    join target t
        on o.from_id = t.from_id and o.link_time < t.link_time
)

同样的模式也适用于其他两个

pre_nodes_*
ctes。考虑到每个连接中的
o.link_time < t.link_time
标准,无需对
UNION
进行重复数据删除,因此切换到
UNION ALL
将通过删除重复数据删除步骤来减少开销。

with target as (
    select fact_id, from_id, link_time from icews14s where fact_id = 69298
),
pre_nodes_1 as (
    select o.*, 1 as degree
    from icews14s o
    join target t
        on o.from_id = t.from_id and o.link_time < t.link_time
),
pre_nodes_2 as (
    select o.*, 2 as degree
    from icews14s o
    join pre_nodes_1 t
        on o.from_id = t.to_id and o.link_time < t.link_time
),
pre_nodes_3 as (
    select o.*, 3 as degree
    from icews14s o
    join pre_nodes_2 t
        on o.from_id = t.to_id and o.link_time < t.link_time
)

select fact_id, from_id, relation_id, to_id, link_time, 0 as degree
from icews14s
where fact_id = 69298
union all
select * from pre_nodes_1
union all
select * from pre_nodes_2
union all
select * from pre_nodes_3
order by fact_id desc
limit 30;

此查询可能会受益于

(from_id, link_time)
上的复合索引:

alter table icews14s
    add index idx_from_id_link_time (from_id, link_time);

对于您的递归 cte,我怀疑性能问题是由于连接条件造成的。如果没有合理数量的测试数据,我无法对此进行测试,但我怀疑像这样重写 cte 的递归部分可能允许它使用索引:

    SELECT o.*, n.degree + 1
    FROM pre_nodes n
    JOIN icews14s o
        ON o.from_id = IF(n.degree = 0, n.from_id, n.to_id)
        AND o.link_time < n.link_time
        AND o.fact_id <> 69298
© www.soinside.com 2019 - 2024. All rights reserved.