提高递归CTE的性能

问题描述 投票:0回答:1

我有一个非常简单的递归CTE,运行在一个单一的源表(REP.INVENTMOVEMENTS)包含约4mln条记录。该表的索引量相当大。

with 
dataset as (
    select imv.sourceBatch, 
           imv.targetBatch,
           imv.sourceDataArea, 
           imv.targetDataArea,
           sum(Weight) as Weight
    from REP.INVENTMOVEMENTS imv 
    where imv.sourceBatch <> ''
    Group By imv.sourceBatch, 
             imv.targetBatch,
             imv.sourceDataArea, 
             imv.targetDataArea
    ),
result as (
    select  targetBatch as Batch,
            targetDataArea as DataArea, 
            sourceBatch, 
            targetBatch,   
            sourceDataArea,
            targetDataArea, 
            1 as level,
            Weight
    from dataset
    where sourceBatch <> targetBatch

    union all 

    select result.Batch,
           result.DataArea, 
           dataset.sourceBatch, 
           dataset.targetBatch, 
           dataset.sourceDataArea,
           dataset.targetDataArea, 
           result.level + 1 as level,
           dataset.Weight
    from dataset inner join result on dataset.targetBatch       = result.sourceBatch 
                                  and dataset.targetDataArea    = result.sourceDataArea
                                  and dataset.targetBatch       <> dataset.sourceBatch
    )

select * from result
union all
select      targetBatch as Batch,
            targetDataArea as DataArea, 
            sourceBatch, 
            targetBatch,   
            sourceDataArea,
            targetDataArea, 
            0 as level,
            Weight
    from dataset
    where sourceBatch = targetBatch
;

运行最初的查询而不进行选择,数据库需要122秒,返回517.947条记录。

运行同样的查询,返回一个批次,需要数据库不到一秒钟的时间,返回5条记录。

但是,如果我在1个批次上运行带有选择的CTE,数据库需要28秒来完成2次递归并返回7条记录。

我需要用150k个批次的结果来填充一个表,所以如果所有的批次都需要半分钟来完成,那么就需要52天来完成这个任务。

这是我的执行计划

执行计划

只是为了澄清我的目标。批次可以合并成新的批次,所以2个或多个源批次可以创建一个新的批次。在这样的合并中创建的两个批次可以用来创建一个新的批次...等等。

我希望能够选择一个批次,并找到所有用于创建这个新批次的批次。

请考虑到一个批次可以用于多个其他批次。

希望您能在这里帮助我。

sql-server common-table-expression query-performance recursive-query
1个回答
0
投票

我已经通过创建一个内部表并将其填入执行递归查询所需的数据集来解决这个问题。

DECLARE @BatchSequence as table( Batch          nvarchar(100),
                                 SourceBatch    nvarchar(100),
                                 TargetBatch    nvarchar(100),
                                 Weight         decimal(18,3));

insert into @BatchSequence 
select ReportingBatch, SourceBatch,TargetBatch, SUM(Weight) as Weight
from REP.INVENTMOVEMENTS
WHERE sourceBatch <> ''
Group By ReportingBatch, SourceBatch,TargetBatch;

with 
result as (
    select  targetBatch as Batch,
            sourceBatch, 
            targetBatch,   
            1 as level,
            Weight
    from @BatchSequence dataset
    where sourceBatch <> targetBatch

    union all 

    select result.Batch,
           dataset.sourceBatch, 
           dataset.targetBatch, 
           result.level + 1 as level,
           dataset.Weight
    from @BatchSequence dataset inner join result on dataset.targetBatch        = result.sourceBatch 
                                  and dataset.targetBatch       <> dataset.sourceBatch
    )

这将在1分钟内返回25万条记录

希望能帮到别人。

© www.soinside.com 2019 - 2024. All rights reserved.