我有两个收藏:
我的目标是尝试调整索引或更改查询管道以优化查询,以回答有多少具有特定标签
active
的tag1
用户,并在2
范围内标记至少date
次,我的mongo管道将是:
[
1 { $match : {
date: { $gte: ..., $lte: ... },
name: 'tag1'
} },
2 { $group: {
_id: '$cid',
totalCount: '$count'
} },
3 { $match: {
totalCount: { $gte: 2 }
} },
4 { $lookup: {
from: 'Customer',
pipeline: [
{ $match: {
"$expr": {
"$and": [
{ "$eq": [ "$_id", "$$cid" ] },
{ "$eq": [ "$state", "active" ] },
]
}
} }
],
as: 'Customer'
} },
5 { "$unwind": "$Customer" },
6 { "$count": "count" }
]
我尝试过添加索引
{
date" : -1.0,
name" : 1.0,
cid" : 1.0
}
以下是没有 $lookup 的
explain("executionStats")
结果(第 4 阶段和第 5 阶段)。 $match
仅花费 802 毫秒,但 $group
部分占用了大部分时间 (16610 - 5598 = 11012 毫秒)。
[
{
$cursor: {
executionStats: {
executionSuccess: true,
nReturned: 2632494,
executionTimeMillis: 17937,
totalKeysExamined: 2632494,
totalDocsExamined: 2632494,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 2632494,
executionTimeMillisEstimate: 802,
inputStage: {
stage: 'FETCH',
nReturned: 2632494,
executionTimeMillisEstimate: 636,
docsExamined: 2632494,
inputStage: {
stage: 'IXSCAN',
nReturned: 2632494,
executionTimeMillisEstimate: 376,
indexName: 'date_1_name_1_cid_1',
direction: 'forward',
indexBounds: {
date: ['[20230101.0, 20231231.0]'],
name: ['["tag1"]'],
cid: ['[MinKey, MaxKey]'],
},
keysExamined: 2632494,
seeks: 1,
},
},
},
},
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(5598),
},
{
$group: {
_id: '$cid',
totalCount: { $sum: '$count' },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(16610),
},
{
$match: {
totalCount: { $gte: 2.0 },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17537),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } }
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17933),
},
{
$project: {
count: true,
_id: false
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17933),
},
];
我还尝试将
count
添加到复合索引的最后一个位置。结果只是好一点而已。可以发现,PROJECTION_SIMPLE
阶段已经变成了PROJECTION_COVERED
,FETCH
阶段消失了,totalDocsExamined
字段显示为0。这很好,因为新索引涵盖了进入之前所需的所有日期。 $group
阶段。然而,组仍然占据大部分时间(15033 - 4239 = 10794 毫秒)
[
{
$cursor: {
executionStats: {
executionTimeMillis: 16285,
totalKeysExamined : 2632494,
totalDocsExamined : 0,
executionStages: {
stage: 'PROJECTION_COVERED',
executionTimeMillisEstimate: 479,
inputStage: {
stage: 'IXSCAN',
executionTimeMillisEstimate: 370,
indexName: 'date_1_name_1_cid_1_count_1',
indexBounds: {
date: ['[20230101.0, 20231231.0]'],
name: ['["tag1"]'],
cid: ['[MinKey, MaxKey]'],
count: ['[MinKey, MaxKey]'],
},
keysExamined: 2632494,
seeks: 1,
},
},
},
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(4239),
},
{
$group: {
_id: '$cid',
totalCount: { $sum: '$count' },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(15033),
},
{
$match: {
totalCount: { $gte: 2.0 },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(15863),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } }
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(16285),
},
{
$project: {
count: true,
_id: false
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(16285),
},
]
还有其他方法导入时间吗?
这部分真的让我抓狂。我在Customer集合上添加了以下索引,希望查找可以充分利用索引。
{
_id: 1.0,
state: 1.0,
}
添加 $lookup 后,需要额外 766,596 - 14,391 = 752,205 毫秒才能完成(天哪...)。理论上,由于我在 Customer 中有复合索引,其中包含查找匹配的所有必需字段,因此查找应该由 IXSCAN 完成。而且因为它不需要附加字段来展开和计数,所以我认为可以在不获取任何文档的情况下完成查询。因为我找不到办法让解释来显示 $lookup 阶段发生的事情。我不知道哪里可以改进。
[
...
{
$match: {
totalDensity: { $gte: 1.0 },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(14391),
},
{
$lookup: { ... },
nReturned: NumberLong(2313852),
executionTimeMillisEstimate: NumberLong(766596),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(766596),
},
{
$project: {
count: true,
_id: false,
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(766596),
}
]
你不需要
{ "$unwind": "$Customer" }
。这个应该更快:
{ $project: { size: { $size: "$Customer" } } },
{ $group: { _id: null, count: { $sum: "$size" } } }