MongoDB索引使用

Question

美好的一天！我遇到了 mongoDB 索引使用的情况。

我有一个请求

db.my_collection.find({ "$and": [ { "expires_at": { "$ne": ISODate("0001-01-01T00:00:00Z") } }, { "expires_at": { "$lte": ISODate("2024-03-29T16:00:00Z") } }, { "operation.updated_at": { "$lte": ISODate("2024-03-29T16:00:00Z") } }], "operation.status": 3, "is_in_some_state": true, "system_type": "type3", "some_status": 1 })

我有一个包含一百万条记录的数据库。我创建了索引

operation.status_1_expires_at_1
// some other indexes for other requests
expires_at_1_operation.updated_at_1_operation.status_1_is_in_some_state_1_system_type_1_some_status_1

我预计第二个索引（即匹配的索引）将被用于请求，但是当我执行 explane() 时，我看到 operation.status_1_expires_at_1 索引已被采用。

那么，第一个问题，怎么/为什么会这样？

如果我告诉 mongo 使用第二个（精确）索引（带有 Hint 选项），我的搜索速度会加快 30%。

我尝试创建一些其他索引只是为了看看是否有变化，当我创建 system_type_1 索引时，我得到了更快的搜索。第二个问题来了——为什么？精确索引不是更好吗？

总结

operation.status_1_expires_at_1 - 1 秒+ 请求
expires_at_1_operation.updated_at_1_operation.status_1_is_in_some_state_1_system_type_1_some_status_1 - ~700 毫秒请求
system_type_1 - 70 毫秒请求

进行了一些聚合以显示数据表示。

db.my_collection.aggregate([{$group:{_id:"$system_type",count:{$sum:1}}},{$sort:{count:-1}}])
{ "_id" : "type1", "count" : 637289 }
{ "_id" : "type2", "count" : 295798 }
{ "_id" : "type3", "count" : 80788 }
{ "_id" : "type4", "count" : 5 }

即使我使用 type1 作为 system_type 进行搜索，我也可以看到仅使用 system_type_1 索引。为什么？

将不胜感激。

Answer 1

您的查询条件可以细分为以下三个简单的查询条件：

const condition1 = {
  expires_at: { $ne: ISODate('0001-01-01T00:00:00Z') },
  'operation.status': 3,
  is_in_some_state: true,
  system_type: 'type3',
  some_status: 1,
};

const condition2 = {
  expires_at: { $lte: ISODate('2024-03-29T16:00:00Z') },
  'operation.status': 3,
  is_in_some_state: true,
  system_type: 'type3',
  some_status: 1,
};

const condition3 = {
  'operation.updated_at': { $lte: ISODate('2024-03-29T16:00:00Z') },
  'operation.status': 3,
  is_in_some_state: true,
  system_type: 'type3',
  some_status: 1,
};

复合索引的顺序至关重要（数据库如何使用索引的秘密）。索引

expires_at_1_operation.updated_at_1_operation.status_1_is_in_some_state_1_system_type_1_some_status_1

对于使用

condition3

的查询没有太大帮助。由于

expires_at

中

condition1

的条件涉及不平等，因此该指数也不会有太大帮助。该索引比

operation.status_1_expires_at_1

更快的原因很简单，因为所有列都存在于索引中，并且数据库不需要从磁盘读取数据进行过滤。

您应该按如下方式创建索引：

{
  'operation.status': 1,
  is_in_some_state: 1,
  system_type: 1,
  some_status: 1,
}

这些字段的顺序取决于您数据的分布；可以过滤掉更多数据的字段应该放在第一位。如果需要，您可以创建两个索引：

{
  'operation.status': 1,
  is_in_some_state: 1,
  system_type: 1,
  some_status: 1,
  expires_at: 1
}
{
  'operation.status': 1,
  is_in_some_state: 1,
  system_type: 1,
  some_status: 1,
  'operation.updated_at': 1
}

expires_at

和

operation.updated_at

应放在最后。

MongoDB索引使用

问题描述投票：0回答：1

1个回答

最新问题

MongoDB索引使用

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1