使用mongo服务器v3.6.16。
我有一个mongo馆藏,大约有1800万条记录。每天大约增加10万条记录。我有一个查询,该查询经常在取决于两个值-user_id
和server_time_stamp
的集合上运行。我为这两个字段设置了复合索引。
索引通常会过时-查询要花几分钟才能完成,并导致服务器消耗其可以抓住的所有CPU。一旦重新生成索引,查询就会很快发生。但是过了一两天,索引又过时了。 (ed。索引现在失败的速度更快-在30分钟之内。)我不知道索引为什么会过时-我能寻找什么?
编辑
这里是索引字段:
{
"uid" : 1,
"server_time_stamp" : -1
}
和索引选项:
{
"v" : 2,
"name" : "server_time_stamp_1_uid_1",
"ns" : "sefaria.user_history"
}
这似乎是Heisenbug。当我使用“解释”时,它表现良好。这是从较长的查询日志中提取的一种病理查询,耗时445秒:
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
这是在重新生成索引后立即执行查询的explain
的结果:
{
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "sefaria.user_history",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"uid" : {
"$eq" : 80588.0
}
},
{
"server_time_stamp" : {
"$gt" : 1577918252.0
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1),
"book" : NumberInt(1),
"last_place" : NumberInt(1)
},
"indexName" : "uid_1_book_1_last_place_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"book" : [
],
"last_place" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"book" : [
"[MinKey, MaxKey]"
],
"last_place" : [
"[MinKey, MaxKey]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"server_time_stamp" : {
"$gt" : 1577918252.0
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid" : NumberInt(1)
},
"indexName" : "uid",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
"executionStages" : {
"stage" : "FETCH",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(99),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"docsExamined" : NumberInt(97),
"alreadyHasObj" : NumberInt(0),
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : NumberInt(97),
"executionTimeMillisEstimate" : NumberInt(0),
"works" : NumberInt(98),
"advanced" : NumberInt(97),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(3),
"restoreState" : NumberInt(3),
"isEOF" : NumberInt(1),
"invalidates" : NumberInt(0),
"keyPattern" : {
"uid" : NumberInt(1),
"server_time_stamp" : NumberInt(-1)
},
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"uid" : [
"[80588.0, 80588.0]"
],
"server_time_stamp" : [
"[inf.0, 1577918252.0)"
]
},
"keysExamined" : NumberInt(97),
"seeks" : NumberInt(1),
"dupsTested" : NumberInt(0),
"dupsDropped" : NumberInt(0),
"seenInvalidated" : NumberInt(0)
}
}
},
"serverInfo" : {
"host" : "mongo-deployment-5cf4f4fff6-dz84r",
"port" : NumberInt(27017),
"version" : "3.6.15",
"gitVersion" : "18934fb5c814e87895c5e38ae1515dd6cb4c00f7"
},
"ok" : 1.0
}
sefaria.user_history command: find { find: "user_history", filter: { server_time_stamp: { $gt: 1577918252 }, uid: 80588 }, sort: { _id: 1 }, lsid: { id: UUID("4936fb55-8514-4442-b852-306686985126") }, $db: "sefaria", $readPreference: { mode: "primaryPreferred" } } planSummary: IXSCAN { _id: 1 } keysExamined:17286277 docsExamined:17286277 cursorExhausted:1 numYields:142780 nreturned:79 reslen:35375 locks:{ Global: { acquireCount: { r: 285562 } }, Database: { acquireCount: { r: 142781 } }, Collection: { acquireCount: { r: 142781 } } } protocol:op_msg 445101ms
查看查询计划,查询使用_id索引。是因为您有一种_id字段。我看着所附的其他计划。
"executionSuccess" : true,
"nReturned" : NumberInt(97),
"executionTimeMillis" : NumberInt(1),
"totalKeysExamined" : NumberInt(97),
"totalDocsExamined" : NumberInt(97),
退回/检查的文件数为1:1。
也该查询正在使用
"indexName" : "server_time_stamp_1_uid_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"uid" : [
],
"server_time_stamp" : [
]
},
我认为两个查询中都缺少某些内容。可能是好的计划中没有提到的排序。你能检查一下吗?
我相信这里的问题是记忆。该实例正在接近物理内存的限制运行。我不能肯定地说,但我相信相关索引已从内存中删除,并且查询性能不佳是由此导致的。重新生成索引将其强制返回到内存中(假定其他内容被踢出了内存。)
我将实例放置在具有更多内存的节点上,到目前为止,它似乎运行良好。
此行为是由于索引不能被选择性地和服务于排序。
慢速操作的日志行显示使用_id
索引的操作。查询计划程序可能会进行此选择,以避免必须对内存中的结果进行排序(请注意缺少hasSortStage: 1
)。但是,结果是,它需要扫描内存中更多的文档(hasSortStage: 1
),这花费了更长的时间。
内存争用也可能起了作用。根据负载,排序结果在内存中产生的开销可能有助于将索引推出RAM并选择_id索引。
一些评论:
作为Babu docsExamined:17286277
,上面发布的说明不包括排序。包括排序可能表明该阶段比IXSCAN消耗更多时间。
索引(noted)的名称表明server_time_stamp_1_uid_1
首先放置在索引中,然后是server_time_stamp
。平等比赛应优先考虑;即uid
应该为uid
。
要考虑的一些选项:
创建索引placed before ranges。有关使用索引进行排序的指南,请参见{ "uid" : 1, "_id" : 1, "server_time_stamp" : 1 }
。尽管here和_id
都可能具有较高的基数,但结果可能会好坏参半,这意味着您可能仍在权衡扫描文档以避免排序。
假设server_time_stamp
值是自动生成的,请考虑按_id
而不是server_time_stamp
进行排序。这将允许您使用_id
进行绑定和排序。 server_time_stamp_1_uid_1
是一个时间戳记,因此也将相对唯一。