数百万行的缓慢合并

Question

我正在将数百万 (1-10) 行合并到一个包含 20 多百万行的表中。

目标表有 16 列，1 PK 列（NormalID）。

我发现的问题是某些记录的 PK 列 ID 发生了更改。因此，例如记录“A”，PK id 为“12345678”，其 id 将更改为“4567890”。这就是为什么我在合并中添加了删除。我当时将数据来源限制为 2 个月。

下面的表格脚本：

CREATE TABLE [reporting].[slowmergetbl](
    [sEntity] [varchar](50) NOT NULL,
    [wYear] [smallint] NOT NULL,
    [wPeriod] [smallint] NOT NULL,
    [sAccount] [varchar](50) NOT NULL,
    [wBracket] [smallint] NOT NULL,
    [sCurrency] [varchar](3) NULL,
    [dValue] [float] NULL,
    [bDirty] [bit] NULL DEFAULT 1,
    [dFactValue] [float] NULL,
    [wEntityId] [int] NULL,
    [wAccountId] [int] NULL,
    [wTimeId] [int] NULL,
    [wExtDimId1] [int] NOT NULL DEFAULT 0,
    [NormalID] [int]  NOT NULL,
    [ModifiedDate] [datetime] NULL,
    [ModifyType] [varchar](10) NULL,
CONSTRAINT [PK_NormalID] PRIMARY KEY CLUSTERED ([NormalID] ASC),
INDEX [IDX_Year] NONCLUSTERED ([wYear] ASC),
INDEX [IDX_Period] NONCLUSTERED ([wPeriod] ASC),

INDEX [IX_ReportQuery] NONCLUSTERED
(
[wYear],
[dFactValue]
)
INCLUDE ([sEntity],[wPeriod],[sAccount],[wBracket],[wAccountId]),


INDEX [IX_ReportQuery2] NONCLUSTERED
(
[sAccount],
[wBracket])
INCLUDE ([sEntity],[wYear],[wPeriod],[dValue],[wEntityId]),

INDEX [IDX_Normal_ModifyType] NONCLUSTERED ([ModifyType] ASC)

);

合并语句：

merge reporting.slowmergetbl as a
using dbo.slowmergetbl as b on a.normalid = b.normalid

WHEN NOT MATCHED BY SOURCE and a.wyear in (
2024
) and a.wPeriod in (9,10 )

THEN UPDATE SET
a.modifieddate=getdate(),
a.modifytype='DELETED'

when not matched by target
then insert (
[sEntity],
[wYear],
[wPeriod],
[sAccount],
[wBracket],
[sCurrency],
[dValue],
[bDirty],
[dFactValue],
[wEntityId],
[wAccountId],
[wTimeId],
[wExtDimId1],
[NormalID],
[ModifiedDate],
[ModifyType]
) 
values (
b.[sEntity],
b.[wYear],
b.[wPeriod],
b.[sAccount],
b.[wBracket],
b.[sCurrency],
b.[dValue],
b.[bDirty],
b.[dFactValue],
b.[wEntityId],
b.[wAccountId],
b.[wTimeId],
b.[wExtDimId1],
b.[NormalID],
getdate(),'INSERT')

when matched
and (
isnull(a.[sAccount],'')<>isnull(b.[sAccount],'')
or isnull(a.[sEntity],'')<>isnull(b.[sEntity],'')
or isnull(a.[wYear],0)<>isnull(b.[wYear],0)
or isnull(a.[wPeriod],0)<>isnull(b.[wPeriod],0)
or isnull(a.[wBracket],0)<>isnull(b.[wBracket],0)
or isnull(a.[wEntityId],0)<>isnull(b.[wEntityId],0)
or isnull(a.[wAccountId],0)<>isnull(b.[wAccountId],0)
or isnull(a.[sCurrency],'')<>isnull(b.[sCurrency],'')
or isnull(cast(a.[dValue] as float),0.00)<>isnull(cast(b.[dValue] as float),0.00)
or isnull(a.[bDirty],0)<>isnull(b.[bDirty],0)
or isnull(cast(a.[dFactValue] as float),0.00)<>isnull(cast(b.[dFactValue] as float),0.00)
or isnull(a.[wTimeId],0)<>isnull(b.[wTimeId],0)
or isnull(a.[wExtDimId1],0)<>isnull(b.[wExtDimId1],0)
)
then update set
a.[sAccount]=b.[sAccount],
a.[sEntity]=b.[sEntity],
a.[wYear]=b.[wYear],
a.[wPeriod]=b.[wPeriod],
a.[wBracket]=b.[wBracket],
a.[wEntityId]=b.[wEntityId],
a.[wAccountId]=b.[wAccountId],
a.[sCurrency]=b.[sCurrency],
a.[dValue]=b.[dValue],
a.[bDirty]=b.[bDirty],
a.[dFactValue]=b.[dFactValue],
a.[wTimeId]=b.[wTimeId],
a.[wExtDimId1]=b.[wExtDimId1],
a.modifieddate=getdate(),
a.modifytype='UPDATE';

似乎运行得很慢，有时需要几个小时。

关于如何改进并加快合并速度有什么建议吗？

谢谢。

Answer 1

一些建议：

检查您是否有索引
```
reporting.slowmergetbl.NormalID
```
,
有时将 MERGE 拆分为 3 个独立的运算符（INSERT/UPDATE/UPDATE）会有所帮助，至少你会知道哪一部分导致了主要延迟，并可以详细分析其执行计划，
检查是否需要更新（“匹配时”部分）的逻辑太复杂。有时这样的逻辑会导致执行计划中出现嵌套循环，并且性能非常差。考虑在两个表中添加“哈希”列，根据您比较的所有字段（sAccount，sEntity，...）计算它，它应该看起来像
```
HASHBYTES('SHA2_256', CONCAT(sAccount,'|',sEntity,'|',...))
```
并仅使用此字段来比较数据以确定是否需要待更新。

数百万行的缓慢合并

问题描述投票：0回答：1

1个回答

最新问题

数百万行的缓慢合并

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1