使用CONTAINS进行全文搜索非常慢

问题描述 投票:2回答:2

我们尝试在Azure数据库上使用全文搜索,并在使用CONTAINS搜索时遇到性能问题。

我们的数据具有星型模式,事实表启用了聚簇列存储索引,大约有4000万行。下面是我们如何在维度上使用CONTAINS并在Fact表上对不同查询进行聚合:

使用EXISTS查询1:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM [SPENDBY].[DimCompanyCode] d

        WHERE f.[FK_DimCompanyCodeId] = d.Id
        AND CONTAINS(d.*, 'Comcast'))

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

此查询似乎永远运行,永远不会返回结果。

外键FK_DimCompanyCodeId]上有非聚集索引,搜索Comcast时只返回一行:

SELECT id  FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');
-- will return id = 5

而且有大约2700万行Fact表有FK_DimCompanyCodeId = 5

使用INNER JOIN查询2:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

INNER JOIN [SPENDBY].[DimCompanyCode] d ON (f.[FK_DimCompanyCodeId] = d.Id)
WHERE CONTAINS(d.*, 'Comcast')

GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

此查询似乎永远运行,并且永远不会返回结果。

使用#temp表查询3:

SELECT id INTO #temp FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM #temp
        WHERE f.[FK_DimCompanyCodeId] = #temp.Id)

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

非常快,5秒后返回结果。

为什么全文搜索在案例1和案例2中都很慢。

sql sql-server full-text-search azure-sql-database star-schema
2个回答
1
投票

问题是竞争索引 - 一个用于JOIN,另一个用于过滤器。也许子查询会说服SQL Server首先使用文本索引:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f JOIN
     (SELECT id
      FROM [SPENDBY].[DimCompanyCode] cc
      WHERE CONTAINS(cc.*, 'Comcast')
     ) cc
     ON cc.id = f.FK_DimCompanyCodeId
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

如果你有FactInvoiceDetail(FK_DimCompanyCodeId)的索引,它可能也会有所帮助。


0
投票

最后,我发现CONTAINS在特定列(例如Description)上效果很好:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f
WHERE  f.[FK_DimCompanyCodeId] IN  (
        SELECT d.Id FROM [SPENDBY].[DimCompanyCode] d
        WHERE CONTAINS(d.[Description], 'Comcast')
)
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

为了搜索整个表,CONTAINSTABLE将具有最佳性能并避免使用#temp表:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f
LEFT OUTER JOIN CONTAINSTABLE([SPENDBY].[DimCompanyCode], *, '"Comcast"') ct 
ON f.[FK_DimCompanyCodeId] = ct.[Key]
WHERE ct.[Key] IS NOT NULL
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC
© www.soinside.com 2019 - 2024. All rights reserved.