$存在最佳复合索引：true（稀疏索引）

Question

问题

我需要加速这种查询：

db.col.find({ a: "foobar", b: { $exists: true} });

数据分布

字段的存在：

字段
```
a
```
存在于所有文档中，
领域
```
b
```
仅存在于其中的 ~10% 中。

当前表统计：

db.col.count() // 1,050,505
db.col.count({ a : "foobar" }) // 517.967
db.col.count({ a : "foobar", b : { $exists: true} }) // 44.922
db.col.count({ b : { $exists: true} }) // 88.981

未来的数据增长：

到目前为止已装载两批（2x 约 500,000）。每个月都会添加另一批约 500,000 份文档。

字段是该批次的名称。这些新添加的文档将具有相同的字段分布（大约 10% 的新加载文档将具有

字段）

我的尝试和研究

我在

{a:1, b:1}

上创建了一个稀疏索引，但因为

存在于所有文档中，所以这并不能加快速度。那是因为 MongoDB 中稀疏索引的行为。来自文档：

仅包含升序/降序索引键的稀疏复合索引将索引文档，只要文档至少包含一个键。

这是上面查询的

.explain()

：

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "myCol",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "a" : {
                        "$eq" : "foobar"
                    }
                }, 
                {
                    "b" : {
                        "$exists" : true
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "KEEP_MUTATIONS",
            "inputStage" : {
                "stage" : "FETCH",
                "filter" : {
                    "b" : {
                        "$exists" : true
                    }
                },
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "keyPattern" : {
                        "a" : 1,
                        "b" : 1
                    },
                    "indexName" : "a_1_b_1",
                    "isMultiKey" : false,
                    "direction" : "forward",
                    "indexBounds" : {
                        "a" : [ 
                            "[\"foobar\", \"foobar\"]"
                        ],
                        "b" : [ 
                            "[MinKey, MaxKey]"
                        ]
                    }
                }
            }
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 44922,
        "executionTimeMillis" : 208656,
        "totalKeysExamined" : 517967,
        "totalDocsExamined" : 517967,
        "executionStages" : {
            "stage" : "KEEP_MUTATIONS",
            "nReturned" : 44922,
            "executionTimeMillisEstimate" : 180672,
            "works" : 550772,
            "advanced" : 44922,
            "needTime" : 473045,
            "needFetch" : 32804,
            "saveState" : 41051,
            "restoreState" : 41051,
            "isEOF" : 1,
            "invalidates" : 0,
            "inputStage" : {
                "stage" : "FETCH",
                "filter" : {
                    "b" : {
                        "$exists" : true
                    }
                },
                "nReturned" : 44922,
                "executionTimeMillisEstimate" : 180612,
                "works" : 550772,
                "advanced" : 44922,
                "needTime" : 473045,
                "needFetch" : 32804,
                "saveState" : 41051,
                "restoreState" : 41051,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsExamined" : 517967,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "nReturned" : 517967,
                    "executionTimeMillisEstimate" : 3035,
                    "works" : 517967,
                    "advanced" : 517967,
                    "needTime" : 0,
                    "needFetch" : 0,
                    "saveState" : 41051,
                    "restoreState" : 41051,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "keyPattern" : {
                        "a" : 1,
                        "b" : 1
                    },
                    "indexName" : "a_1_b_1",
                    "isMultiKey" : false,
                    "direction" : "forward",
                    "indexBounds" : {
                        "a" : [ 
                            "[\"foobar\", \"foobar\"]"
                        ],
                        "b" : [ 
                            "[MinKey, MaxKey]"
                        ]
                    },
                    "keysExamined" : 517967, // INFO: I think that this is too much. These are all documents having a:"foobar"
                    "dupsTested" : 0,
                    "dupsDropped" : 0,
                    "seenInvalidated" : 0,
                    "matchTested" : 0
                }
            }
        },
        "allPlansExecution" : []
    },
    "serverInfo" : {
        "host" : "productive-mongodb-16",
        "port" : 27000,
        "version" : "3.0.1",
        "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
    }
}

存在于所有 1,000,000 个文档中，其中 520,000 个文档包含

a:"foobar"

。整个集合中有 88,000 个具有

字段。

如何加速我的查询（以便 IXSCAN 仅返回 44k 而不是 520k）？

Answer 1

您在这里似乎不理解的是，

$exists

无法以任何方式“抓取”索引，即使索引稀疏。正如文档本身所说：

“如果稀疏索引会导致查询和排序操作的结果集不完整，MongoDB 将不会使用该索引”

这些页面中给出的示例是一个

{ "$exists": false }

查询。但相反的逻辑条件在这里没有任何区别。

为了获得“稀疏”索引的“全部好处”，您需要考虑它所保存的数据的“类型”并进行适当的查询。

对于数字，类似：

db.collection.find({ "a": "foobar", "b": { "$gte": -9999, "$lte": 9999 } })

它使用索引，并且是稀疏索引。或者基于文本：

db.collection.find({ "a": "foobar", "b": /.+/ })

这也将使用稀疏索引，并且只查看那些定义了“b”的索引。

对于“数组”，那么“小心”。因为所查看的值可能是上述值之一，除非您这样做：

db.collection.insert({ "a": 1, "b": [[]] })

那么这可以吗：

db.ab.find({ "a": 1, "b": { "$type": 4 } })

但出于同样的原因，也不会真正使用“稀疏”索引

$exists

在这里不起作用。

因此，如果您期望获得最大性能，您需要了解这些术语的含义以及“适当查询”，以便使用您创建的索引定义。

这些都是清晰的例子，您可以自己测试一下，看看结果是否正确。我确实希望核心文档在这些方面更加清晰，但我也知道许多人试图做出贡献（并做出了出色的解释），但迄今为止这些内容都没有包含在内。

我猜这就是你在这里问的原因。

$存在最佳复合索引：true（稀疏索引）

问题描述投票：0回答：1

问题

数据分布

字段的存在：

当前表统计：

未来的数据增长：

我的尝试和研究

1个回答

最新问题

$存在最佳复合索引：true（稀疏索引）

问题描述 投票：0回答：1

问题

数据分布

字段的存在：

当前表统计：

未来的数据增长：

我的尝试和研究

1个回答

最新问题

问题描述投票：0回答：1