我正在尝试将库存与产品进行匹配,但他们的遗留数据中没有产品 ID 的概念,因此连接相当复杂。
我让它与一堆 OR 一起工作,运行需要一个多小时(即使有良好的索引)。我更喜欢使用 CASE 来减少处理,从而减少运行时间,但我在
r.[Company] = 'CAN' AND
行中的 = 处收到错误
我尝试将其简化为第一种情况,然后添加
ELSE 1 END
,但仍然出现相同的错误,所以我不确定出了什么问题。
SELECT * FROM
[raw_inventory] r
LEFT JOIN [master_products] mp ON
CASE
/* Logic:
WHEN: Company = CAN AND Shape = RD or Blank/NULL
THEN: Match mNum, Shape and Width
Only in 1, 2, 3 Files
*/
WHEN
r.[Company] = 'CAN' AND
(r.[Shape] = 'RD' OR r.[Shape] = '' OR r.[Shape] IS NULL)
THEN
r.[SHAPE] = mp.[Shape] AND
r.[MASTERNUM] IN ( SELECT value FROM STRING_SPLIT(mp.[mNum],';') ) AND
CAST(oa.[WIDTH] AS Decimal(20,4)) = CAST(mp.[Width] AS Decimal(20,4)) AND
(mp.[File] = 1 OR mp.[File] = 2 OR [File] = 3)
/* Logic:
WHEN: Company = CAN AND Shape = SQ, NO
THEN: Match mNum, Shape and Width
Only in 1, 2, 3 Files
*/
WHEN
r.[Company] = 'CAN' AND
r.[Shape] IN ('SQ', 'NO')
THEN
r.[SHAPE] = mp.[Shape] AND
r.[MASTERNUM] IN ( SELECT value FROM STRING_SPLIT(mp.[mNum],';') ) AND
CAST(oa.[LENGTH] AS Decimal(20,4)) = CAST(mp.[Length] AS Decimal(20,4)) AND
(mp.[File] = 1 OR mp.[File] = 2 OR [File] = 3)
/* Logic:
Company = US
Master Number = 003(alpha)
Match Run Number
Looks in All Files
*/
WHEN
r.[Company] = 'US' AND
PATINDEX('%003[A-TV-Za-tv-z]%', r.[MASTERNUM]) = 1
THEN
r.[Run] IN ( SELECT value FROM STRING_SPLIT(mp.[Run Number],';') ) OR
r.[Run] IN ( SELECT value FROM STRING_SPLIT(mp.[Indexer],';') )
/* Logic:
Company = US
Master Number = 003U
Match RDI Number
Only in 4, 5 Files
*/
WHEN
r.[Company] = 'US' AND
r.[MASTERNUM] LIKE '003U%'
THEN
r.[RDI] = mp.[RDI] AND
([File] = '4' OR [File] = '4')
/* Logic:
Company = US
Match Master Number to Run/Indexer Number
Looks in All Files
*/
ELSE
r.[MASTERNUM] IN ( SELECT value FROM STRING_SPLIT(mp.[Run Number],';') ) OR
r.[MASTERNUM] IN ( SELECT value FROM STRING_SPLIT(mp.[Indexer],';') )
END
对此已经有很多很好的评论,但是您不能使用 case expression 来返回连接条件 - 这正是您尝试做的。因此,只需实现布尔逻辑来分隔需要实现的各种条件集。 (注意:这里相当于“WHEN/ELSE”的布尔值是“OR”)
如果执行此操作,查询将类似于以下内容(注意:我不保证 100% 准确的“翻译”,但希望接近)。
SELECT *
FROM [raw_inventory] r
LEFT JOIN [master_products] mp ON (
(
r.[Company] = 'CAN'
AND (
r.[Shape] = 'RD'
OR r.[Shape] = ''
OR r.[Shape] IS NULL
)
AND r.[SHAPE] = mp.[Shape]
AND r.[MASTERNUM] IN (
SELECT value
FROM STRING_SPLIT(mp.[mNum], ';')
)
AND CAST(oa.[WIDTH] AS DECIMAL(20, 4)) = CAST(mp.[Width] AS DECIMAL(20, 4))
AND (
mp.[File] = 1
OR mp.[File] = 2
OR [File] = 3
)
)
/* separate query here? */
OR (
r.[Company] = 'CAN'
AND r.[Shape] IN ('SQ', 'NO')
AND r.[SHAPE] = mp.[Shape]
AND r.[MASTERNUM] IN (
SELECT value
FROM STRING_SPLIT(mp.[mNum], ';')
)
AND CAST(oa.[LENGTH] AS DECIMAL(20, 4)) = CAST(mp.[Length] AS DECIMAL(20, 4))
AND (
mp.[File] = 1
OR mp.[File] = 2
OR [File] = 3
)
)
/* separate query here? */
OR (
r.[Company] = 'US'
AND PATINDEX('%003[A-TV-Za-tv-z]%', r.[MASTERNUM]) = 1
AND (
R.[Run] IN (
SELECT value
FROM STRING_SPLIT(mp.[Run Number], ';')
)
OR r.[Run] IN (
SELECT value
FROM STRING_SPLIT(mp.[Indexer], ';')
)
)
)
/* separate query here? */
OR (
r.[Company] = 'US'
AND r.[MASTERNUM] LIKE '003U%'
AND r.[RDI] = mp.[RDI]
AND (
[File] = '4'
OR [File] = '4'
)
)
/* separate query here? */
OR (
r.[Company] = 'US'
AND (
r.[MASTERNUM] IN (
SELECT value
FROM STRING_SPLIT(mp.[Run Number], ';')
)
OR r.[MASTERNUM] IN (
SELECT value
FROM STRING_SPLIT(mp.[Indexer], ';')
)
)
)
);
您很快就会发现,这是一个非常笨拙的连接,而且性能会很差。几乎可以肯定,最好将其分解为单独的查询,然后通过将它们联合在一起来组合。
我还有一个特别关心的问题,即使您将其分解为多个查询,也是您在多个地方对 STRING_SPLIT 的依赖。这些连接条件不会通过索引得到帮助,并且可能会导致表扫描,即几乎可以肯定这些 STRING_SPLIT 条件会导致性能不佳。
我会尽量避免在主表中使用连接列 - 但如果不重新建模,战术方法可能是准备一个索引临时表。或者(也在评论中建议)使用应用运算符(酌情交叉应用或外部应用)来简化访问主表的串联列详细信息的方式,并在 CTE 中执行此操作以避免代价高昂的重复。