需要优化mongodb查询

问题描述 投票:0回答:1

我有一个 mongodb 查询,并且已经创建了所有必需的索引。但查询仍然扫描集合中的所有文档。

以下是查询:

[
    {
        "$group": {
            "driverStatuses": {
                "$addToSet": "$driverStatus"
            },
            "batchAddedAt": {
                "$first": "$batchAddedAt"
            },
            "cityId": {
                "$first": "$operationRegion"
            },
            "pickLat": {
                "$first": "$pickLat"
            },
            "pickLng": {
                "$first": "$pickLng"
            },
            "_id": "$batchId"
        }
    },
    {
        "$match": {
            "batchAddedAt": {
                "$gte": 1705319256
            }
        }
    },
    {
        "$match": {
            "driverStatuses": {
                "$nin": [
                    "COMPLETED",
                    "CANCELLED",
                    "START_PICKUP",
                    "ACCEPTED",
                    "WAITING"
                ]
            }
        }
    },
    {
        "$sort": {
            "_id": 1
        }
    }
]

它的日志是:

{
  "t": {
    "$date": "2024-01-16T11:47:41.418+00:00"
  },
  "s": "I",
  "c": "COMMAND",
  "id": 51803,
  "ctx": "conn486616",
  "msg": "Slow query",
  "attr": {
    "type": "command",
    "ns": "dms.batchAssignmentActivity",
    "command": {
      "aggregate": "batchAssignmentActivity",
      "pipeline": [
        {
          "$group": {
            "driverStatuses": {
              "$addToSet": "$driverStatus"
            },
            "batchAddedAt": {
              "$first": "$batchAddedAt"
            },
            "cityId": {
              "$first": "$operationRegion"
            },
            "pickLat": {
              "$first": "$pickLat"
            },
            "pickLng": {
              "$first": "$pickLng"
            },
            "_id": "$batchId"
          }
        },
        {
          "$match": {
            "batchAddedAt": {
              "$gte": 1705319256
            }
          }
        },
        {
          "$match": {
            "driverStatuses": {
              "$nin": [
                "COMPLETED",
                "CANCELLED",
                "START_PICKUP",
                "ACCEPTED",
                "WAITING"
              ]
            }
          }
        },
        {
          "$sort": {
            "_id": 1
          }
        }
      ],
      "cursor": {},
      "lsid": {
        "id": {
          "$uuid": "a1ecf292-f2a0-4ce9-8520-a66f86fd3a80"
        }
      },
      "$clusterTime": {
        "clusterTime": {
          "$timestamp": {
            "t": 1705405656,
            "i": 32
          }
        },
        "signature": {
          "hash": {
            "$binary": {
              "base64": "JyBotKPMVXqaekfqPtz6E4RwE20=",
              "subType": "0"
            }
          },
          "keyId": 7278270623686590000
        }
      },
      "$db": "dms",
      "$readPreference": {
        "mode": "primary"
      }
    },
    "planSummary": "COLLSCAN",
    "keysExamined": 0,
    "docsExamined": 276938,
    "hasSortStage": true,
    "cursorExhausted": true,
    "numYields": 367,
    "nreturned": 15,
    "queryHash": "09C9423E",
    "queryFramework": "sbe",
    "reslen": 2963,
    "locks": {
      "FeatureCompatibilityVersion": {
        "acquireCount": {
          "r": 369
        }
      },
      "Global": {
        "acquireCount": {
          "r": 369
        }
      },
      "Mutex": {
        "acquireCount": {
          "r": 2
        }
      }
    },
    "readConcern": {
      "level": "local",
      "provenance": "implicitDefault"
    },
    "writeConcern": {
      "w": 1,
      "wtimeout": 0,
      "provenance": "customDefault"
    },
    "storage": {
      "data": {
        "bytesRead": 203982595,
        "timeReadingMicros": 54403
      }
    },
    "remote": "10.20.194.26:60606",
    "protocol": "op_msg",
    "durationMillis": 4453
  }
}

我真的不知道原因。为什么这里不使用索引?

在屏幕截图中分享索引详细信息。Indices details

我已经创建了索引,但仍在扫描集合的所有文档。预计查询应使用索引。

mongodb mongodb-query
1个回答
0
投票

对于聚合,索引仅用于管道的第一阶段(例如

$match
$sort
)。对于所讨论的聚合,第一阶段是
$group
阶段,它是阻塞阶段。这意味着在管道可以继续之前必须知道所有文档。这导致不使用索引的查询速度很慢。

在不了解文档结构和查询目标的情况下,很难提供解决方案。因此,请根据您的需求调整以下示例。

docs 的一个重要指南是,如果索引是

$match
$sort
,也许还有
$group
,则可以在管道的第一阶段使用索引。

在您的示例中,您可以在开始时应用

$match
,然后按
batchId
对文档进行分组(为了简洁起见,我省略了几个属性,但这应该会给您一个想法):

[
  {
    $match: {
      batchAddedAt: { $gte: 1705319256 },
    },
  },
  {
    $group: {
      _id: "$batchId",
      driverStatuses: {
        $addToSet: "$driverStatus",
      },
      batchAddedAt: {
        $first: "$batchAddedAt",
      },
      cityId: { $first: "$operationRegion" },
    },
  },
  {
    $match: {
      driverStatuses: {
        $nin: [
          "COMPLETED",
          "CANCELLED",
          "START_PICKUP",
          "ACCEPTED",
          "WAITING",
        ],
      },
    },
  },
  {
    $sort: { _id: 1 }
  }
]

这可以通过指数

batchAddedAt_1
来支持。由于开头的
$match
,减少了通过管道推送的文档量;
$sort
$group
也受到指数的支持。请根据需要调整示例。

© www.soinside.com 2019 - 2024. All rights reserved.