有人要求我演示如何在索引字段上搜索比搜索字符串前缀更快,因此我创建了一个快速测试,但结果令人惊讶,我不明白为什么。
数据库由一个表(产品)组成,其中包含产品名称和品牌字段以及一些其他字段,只是为了通过品牌索引来增加数据量:
CREATE TABLE [dbo].[Products](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ProductName] [nvarchar](500) NOT NULL,
[Brand] [nvarchar](100) NOT NULL,
[Field1] [nvarchar](50) NULL,
[Field2] [nvarchar](50) NULL,
[Field3] [nvarchar](50) NULL,
[Field4] [nvarchar](50) NULL,
[Field5] [nvarchar](50) NULL,
Ix_Brands index(Brand),
CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
然后,我使用 4 种不同的方法按品牌获取产品,并计算每种方法需要多长时间。
using System;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
namespace SpeedTest
{
internal class Program
{
static void Main(string[] args)
{
var connectionString = "data source=.<Redacted>";
string[] Brands = new string[] { "Tesco", "Asda", "Boots", "Morrisons", "Amazon", "Ebay" };
var rnd = new Random();
DateTime startTime;
using (var con = new SqlConnection(connectionString))
{
var cmd = new SqlCommand("delete from products", con);
con.Open();
cmd.ExecuteNonQuery();
Console.WriteLine("Creating 100,000 products");
for (int i = 0; i < 100000; i++)
{
var brand = Brands[rnd.Next(Brands.Length)];
cmd.CommandText = $"insert into products(productName, brand, field1, field2, field3, field4, field5) values ('{brand}_{Guid.NewGuid()}', '{brand}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}')";
cmd.ExecuteNonQuery();
}
Console.WriteLine("Getting products by brand via ADO and product name prefix");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
cmd.CommandText = $"select * from products where productName like '{brand}_%'";
var da = new SqlDataAdapter(cmd);
var dt = new DataTable();
da.Fill(dt);
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.WriteLine("Getting products by brand via ADO and indexed brand field");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
cmd.CommandText = $"select * from products where brand='{brand}'";
var da = new SqlDataAdapter(cmd);
var dt = new DataTable();
da.Fill(dt);
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
con.Close();
}
var db = new SpeedTestEntities();
Console.WriteLine("Getting products by brand via entity framework and product name prefix");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
var products = db.Products.Where(p => p.ProductName.StartsWith(brand + "_")).ToList();
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.WriteLine("Getting products by brand via entity framework and indexed brand field");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
var products = db.Products.Where(p => p.Brand.Equals(brand, StringComparison.OrdinalIgnoreCase)).ToList();
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.ReadLine();
}
}
}
实体框架结果符合我的预期,但 ADO 结果显示索引字段的搜索速度比产品名称前缀慢,这肯定是不正确的:
Creating 100,000 products
Getting products by brand via ADO and product name prefix
Time taken: 558.9306ms
Getting products by brand via ADO and indexed brand field
Time taken: 642.5258ms
Getting products by brand via entity framework and product name prefix
Time taken: 3266.8438ms
Getting products by brand via entity framework and indexed brand field
Time taken: 204.932ms
我一定是哪里搞砸了,但我看不出哪里。我关于为什么我们应该向数据库表添加索引字段而不是为其他字符串添加前缀的演示进展很糟糕。
有人可以救救我并看看这里发生了什么事吗?
首先,搜索Brand等于“Tesco”的行,以及Brand以“Tesco”开头的行,都可以使用索引来查找第一个匹配的行。 检查一下执行计划,你就会发现。
你的测试有一个大问题,也许还有一个小问题。首先,测试检索查询结果的总时间主要衡量将数据发送到客户端所需的时间。 相反,通过查看实际执行计划或使用 SET STATISTICS TIME ON 来测量查询的 CPU 时间。
小问题是您搜索的前缀错误。
like '{brand}_%'";
不会匹配任何行。 EG运行
select 1 where 'Tesco' like 'Tesco_'