优化 mongo $group、$lookup 和 $count 查询的查询和索引

问题描述 投票:0回答:1

我有两个收藏:

  • 客户
    • id
    • 状态:处于['活动','非活动']
  • 标签
    • cid:映射到 customer.id
    • 日期:YYYYMMDD 整数
    • 名称:标签名称
    • count:在特定日期内用户被特定标签名称标记的次数

我的目标是尝试调整索引或更改查询管道以优化查询,以回答有多少具有特定标签

active
tag1
用户,并在
2
范围内标记至少
date
次,我的mongo管道将是:

[
1   { $match : {
        date: { $gte: ..., $lte: ... },
        name: 'tag1'
    } },
2   { $group: {
        _id: '$cid',
        totalCount: '$count'
    } },
3   { $match: {
        totalCount: { $gte: 2 }
    } },
4   { $lookup: {
        from: 'Customer',
        pipeline: [
            { $match: {
                "$expr": {
                    "$and": [
                        { "$eq": [ "$_id", "$$cid" ] },
                        { "$eq": [ "$state", "active" ] },
                    ]
                }
            } }
        ],
        as: 'Customer'
    } },
5   { "$unwind": "$Customer" },
6   { "$count": "count" }
]

第一部分(阶段1~3)

我尝试过添加索引

{
    date" : -1.0,
    name" : 1.0,
    cid" : 1.0
}

以下是没有 $lookup 的

explain("executionStats")
结果(第 4 阶段和第 5 阶段)。
$match
仅花费 802 毫秒,但
$group
部分占用了大部分时间 (16610 - 5598 = 11012 毫秒)。

[
    {
        $cursor: {
            executionStats: {
                executionSuccess: true,
                nReturned: 2632494,
                executionTimeMillis: 17937,
                totalKeysExamined: 2632494,
                totalDocsExamined: 2632494,
                executionStages: {
                    stage: 'PROJECTION_SIMPLE',
                    nReturned: 2632494,
                    executionTimeMillisEstimate: 802,
                    inputStage: {
                        stage: 'FETCH',
                        nReturned: 2632494,
                        executionTimeMillisEstimate: 636,
                        docsExamined: 2632494,
                        inputStage: {
                            stage: 'IXSCAN',
                            nReturned: 2632494,
                            executionTimeMillisEstimate: 376,
                            indexName: 'date_1_name_1_cid_1',
                            direction: 'forward',
                            indexBounds: {
                                date: ['[20230101.0, 20231231.0]'],
                                name: ['["tag1"]'],
                                cid: ['[MinKey, MaxKey]'],
                            },
                            keysExamined: 2632494,
                            seeks: 1,
                        },
                    },
                },
            },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(5598),
    },
    {
        $group: {
            _id: '$cid',
            totalCount: { $sum: '$count' },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(16610),
    },
    {
        $match: {
            totalCount: { $gte: 2.0 },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17537),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } }
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17933),
    },
    {
        $project: { 
            count: true, 
            _id: false
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17933),
    },
];

我还尝试将

count
添加到复合索引的最后一个位置。结果只是好一点而已。可以发现,
PROJECTION_SIMPLE
阶段已经变成了
PROJECTION_COVERED
FETCH
阶段消失了,
totalDocsExamined
字段显示为0。这很好,因为新索引涵盖了进入之前所需的所有日期。
$group
阶段。然而,组仍然占据大部分时间(15033 - 4239 = 10794 毫秒)

[
    {
        $cursor: {
            executionStats: {
                executionTimeMillis: 16285,
                totalKeysExamined : 2632494,
                totalDocsExamined : 0,
                executionStages: {
                    stage: 'PROJECTION_COVERED',
                    executionTimeMillisEstimate: 479,
                    inputStage: {
                        stage: 'IXSCAN',
                        executionTimeMillisEstimate: 370,
                        indexName: 'date_1_name_1_cid_1_count_1',
                        indexBounds: {
                            date: ['[20230101.0, 20231231.0]'],
                            name: ['["tag1"]'],
                            cid: ['[MinKey, MaxKey]'],
                            count: ['[MinKey, MaxKey]'],
                        },
                        keysExamined: 2632494,
                        seeks: 1,
                    },
                },
            },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(4239),
    },
    {
        $group: {
            _id: '$cid',
            totalCount: { $sum: '$count' },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(15033),
    },
    {
        $match: {
            totalCount: { $gte: 2.0 },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(15863),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } }
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(16285),
    },
    {
        $project: { 
            count: true, 
            _id: false
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(16285),
    },
]

还有其他方法导入时间吗?

第二部分(所有阶段)

这部分真的让我抓狂。我在Customer集合上添加了以下索引,希望查找可以充分利用索引。

{
    _id: 1.0,
    state: 1.0,
}

添加 $lookup 后,需要额外 766,596 - 14,391 = 752,205 毫秒才能完成(天哪...)。理论上,由于我在 Customer 中有复合索引,其中包含查找匹配的所有必需字段,因此查找应该由 IXSCAN 完成。而且因为它不需要附加字段来展开和计数,所以我认为可以在不获取任何文档的情况下完成查询。因为我找不到办法让解释来显示 $lookup 阶段发生的事情。我不知道哪里可以改进。

[
    ...
    {
        $match: {
            totalDensity: { $gte: 1.0 },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(14391),
    },
    {
        $lookup: { ... },
        nReturned: NumberLong(2313852),
        executionTimeMillisEstimate: NumberLong(766596),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(766596),
    },
    {
        $project: {
            count: true,
            _id: false,
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(766596),
    }
]
mongodb mongodb-query lookup group compound-index
1个回答
0
投票

你不需要

{ "$unwind": "$Customer" }
。这个应该更快:

{ $project: { size: { $size: "$Customer" } } },
{ $group: { _id: null, count: { $sum: "$size" } } }
© www.soinside.com 2019 - 2024. All rights reserved.