我有一个包含 800k 个对象的数据库,我定义了大约 13 个分片服务器来快速访问数据。我为每个对象分配了一个字母以供在分片过程中使用,例如,shards: 'a' 表示第一个对象,shards: 'b' 表示第二个对象,依此类推。我使用每个对象中的分片字段创建了一个分片键,并希望在 13 个分片服务器上尽可能均匀地分布对象。我使用“hashed”作为分片字段的分片键。我将字母均匀地分配给所有对象,例如,50k 个对象有分片:'a',50k 个对象有分片:'b',依此类推。我使用 "sh.shardCollection("test.testCollection", { "shards": "hashed" } ) 对集合进行分片,但是数据只去了 13 个分片服务器中的两个。分布在两个服务器之间不均匀, 大约有 72% 分配给一台服务器,28% 分配给另一台服务器。我希望数据均匀分布在所有 13 个分片服务器中。你能帮我解决这个问题吗?
[
{
_id: 'a',
host: 'a/127.0.0.1:21000,127.0.0.1:21001,127.0.0.1:21002',
state: 1,
topologyTime: Timestamp({ t: 1675107083, i: 3 })
},
{
_id: 'b',
host: 'b/127.0.0.1:22000,127.0.0.1:22001,127.0.0.1:22002',
state: 1,
topologyTime: Timestamp({ t: 1675107100, i: 5 })
},
{
_id: 'c',
host: 'c/127.0.0.1:23000,127.0.0.1:23001,127.0.0.1:23002',
state: 1,
[direct: mongos] test>
draining: true
},
{
_id: 'd',
host: 'd/127.0.0.1:23010,127.0.0.1:23011,127.0.0.1:23012',
state: 1,
topologyTime: Timestamp({ t: 1676821653, i: 5 })
},
{
_id: 'e',
host: 'e/127.0.0.1:23020,127.0.0.1:23021,127.0.0.1:23022',
state: 1,
topologyTime: Timestamp({ t: 1676821663, i: 5 })
},
{
_id: 'f',
host: 'f/127.0.0.1:23030,127.0.0.1:23031,127.0.0.1:23032',
state: 1,
topologyTime: Timestamp({ t: 1676821668, i: 1 })
},
{
_id: 'g',
host: 'g/127.0.0.1:23040,127.0.0.1:23041,127.0.0.1:23042',
state: 1,
topologyTime: Timestamp({ t: 1676821673, i: 5 })
},
{
_id: 'h',
host: 'h/127.0.0.1:23050,127.0.0.1:23051,127.0.0.1:23052',
state: 1,
topologyTime: Timestamp({ t: 1676821678, i: 5 })
},
{
_id: 'j',
host: 'j/127.0.0.1:23060,127.0.0.1:23061,127.0.0.1:23062',
state: 1,
topologyTime: Timestamp({ t: 1676821685, i: 5 })
},
{
_id: 'k',
host: 'k/127.0.0.1:23070,127.0.0.1:23071,127.0.0.1:23072',
state: 1,
topologyTime: Timestamp({ t: 1676821689, i: 5 })
},
{
_id: 'l',
host: 'l/127.0.0.1:23080,127.0.0.1:23081,127.0.0.1:23082',
state: 1,
topologyTime: Timestamp({ t: 1676821694, i: 5 })
},
{
_id: 'm',
host: 'm/127.0.0.1:23090,127.0.0.1:23091,127.0.0.1:23092',
state: 1,
topologyTime: Timestamp({ t: 1676821698, i: 5 })
},
{
_id: 'n',
host: 'n/127.0.0.1:24000,127.0.0.1:24001,127.0.0.1:24002',
state: 1,
topologyTime: Timestamp({ t: 1676821708, i: 4 })
}
]
Shard a at a/127.0.0.1:21000,127.0.0.1:21001,127.0.0.1:21002
{
data: '125.57MiB',
docs: 227420,
chunks: 1,
'estimated data per chunk': '125.57MiB',
'estimated docs per chunk': 227420
}
Shard k at k/127.0.0.1:23070,127.0.0.1:23071,127.0.0.1:23072
{
data: '326.31MiB',
docs: 576209,
chunks: 1,
'estimated data per chunk': '326.31MiB',
'estimated docs per chunk': 576209
}
对象样本:
{
"_id": {
"$oid": "63dd7324289226c918818c55"
},
"Title": "",
"Product": {
"web1": {
"Harry Potter and the Chamber of Secrets: 2/7 (Harry Potter 2)": {
"Price": 15,
"Url": "https://www.amazon.com/Harry-Potter-Chamber-Secrets-Book/dp/B017V4IPPO/ref=sr_1_2?crid=GCT8C7Z3Q4SE&keywords=Harry+Potter+and+the+Chamber+of+Secrets&qid=1676836656&sprefix=harry+potter+and+the+chamber+of+secrets%2Caps%2C230&sr=8-2",
"Time": {
"$date": {
"$numberLong": "1676669514749"
}
}
}
}
},
"Category": [
"Book",
"Fantasy"
],
"Time": {
"$date": {
"$numberLong": "1676669514749"
}
},
"shards": "h"
}
我想确保数据在我的分片服务器之间均匀分布。我想了解我需要为此做些什么。