I have been playing around with MongoDb with Sharding, These NoSQL datastores are gaining popularity due ease of use and they can be scaled out i.e horizontal scaling. This horizontal scaling in MongoDB can be achieved by creating a sharded cluster of mongodb instances.
sh.addShard(
sh.enableSharding(
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5a2aadadeb346a72fb38f209")
}
shards:
{ "_id" : "rs1", "host" : "rs1/localhost:27012,localhost:27013", "state" : 1 }
{ "_id" : "rs2", "host" : "rs2/localhost:27014,localhost:27015", "state" : 1 }
active mongoses:
"3.4.7" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Balancer lock taken at Mon Jan 08 2018 12:04:06 GMT+0530 (India Standard Time) by ConfigServer:Balancer
Failed balancer rounds in last 5 attempts: 5
Last reported error: could not find host matching read preference { mode: "primary" } for set rs2
Time of Reported error: Mon Jan 01 2018 20:18:34 GMT+0530 (India Standard Time)
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "students", "primary" : "rs1", "partitioned" : true }
students.grades
shard key: { "student_id" : 1 }
unique: false
balancing: true
chunks:
rs1 5
rs2 4
{ "student_id" : { "$minKey" : 1 } } -->> { "student_id" : 1 } on : rs2 Timestamp(2, 0)
{ "student_id" : 1 } -->> { "student_id" : 6 } on : rs2 Timestamp(3, 0)
{ "student_id" : 6 } -->> { "student_id" : 38310 } on : rs2 Timestamp(4, 0)
{ "student_id" : 38310 } -->> { "student_id" : 82368 } on : rs2 Timestamp(5, 0)
{ "student_id" : 82368 } -->> { "student_id" : 120672 } on : rs1 Timestamp(5, 1)
{ "student_id" : 120672 } -->> { "student_id" : 159241 } on : rs1 Timestamp(3, 3)
{ "student_id" : 159241 } -->> { "student_id" : 197545 } on : rs1 Timestamp(4, 2)
{ "student_id" : 197545 } -->> { "student_id" : 241433 } on : rs1 Timestamp(4, 3)
{ "student_id" : 241433 } -->> { "student_id" : { "$maxKey" : 1 } } on : rs1 Timestamp(4, 4)
{ "_id" : "uts", "primary" : "rs2", "partitioned" : true }
uts.irctc
shard key: { "userid" : 1 }
unique: false
balancing: true
chunks:
rs1 1
rs2 2
{ "userid" : { "$minKey" : 1 } } -->> { "userid" : 1933 } on : rs2 Timestamp(2, 1)
{ "userid" : 1933 } -->> { "userid" : 8087 } on : rs2 Timestamp(1, 2)
{ "userid" : 8087 } -->> { "userid" : { "$maxKey" : 1 } } on : rs1 Timestamp(2, 0)
The MongoDB reference has a very clear explanation for the same here.
A sharded setup of mongodb requires the following:
- Mongodb Configuration server – this stores the cluster’s metadata
- mongos instance – this is the router and routes the queries to different shards based on the sharding key
- Individual mongodb instances – these act as the shards.
- Lets create all of the above components in a single instance i.e on your localhost.
- Create directories for db path and log
- 2 config server, 2 set of shard
C:\MongoDB\shard\configdb0,configdb1
C:\MongoDB\shard\sdata1_1,sdata1_2,sdata2_1,sdata2_2
For Config Server execute the below steps:
mongod --configsvr --replSet rs0 --port 27009 --dbpath C:\MongoDB\shard\configdb0 --logpath "C:\MongoDB\shard\logs\conf0.log"
mongod --configsvr --replSet rs0 --port 27010 --dbpath C:\MongoDB\shard\configdb1 --logpath "C:\MongoDB\shard\logs\conf1.log"
mongo -host localhost -port 27009
rs.initiate(
{
_id: "rs0",
configsvr: true,
members: [{ _id : 0, host : "localhost:27009" },
{ _id : 1, host : "localhost:27010" }]
}
)
For making shard servers execute below steps:
mongod --shardsvr --port 27012 --replSet rs1 --logpath "C:\MongoDB\shard\logs\shard1_1.log" --logappend --dbpath "C:\MongoDB\shard\sdata1_1"
mongod --shardsvr --port 27013 --replSet rs1 --logpath "C:\MongoDB\shard\logs\shard1_2.log" --logappend --dbpath "C:\MongoDB\shard\sdata1_2"
mongo -host localhost -port 27012
rs.initiate(
{
_id: "rs1",
members: [{ _id : 0, host : "localhost:27012" },
{ _id : 1, host : "localhost:27013" }]
}
)mongod --shardsvr --port 27014 --replSet rs2 --logpath "C:\MongoDB\shard\logs\shard2_1.log" --logappend --dbpath "C:\MongoDB\shard\sdata2_1"
mongod --shardsvr --port 27015 --replSet rs2 --logpath "C:\MongoDB\shard\logs\shard2_2.log" --logappend --dbpath "C:\MongoDB\shard\sdata2_2"
mongo -host localhost -port 27014
rs.initiate(
{
_id: "rs2",
members: [{ _id : 0, host : "localhost:27014" },
{ _id : 1, host : "localhost:27015" }]
}
)
Now Registering the shards with mongos
Now that we have created our two mongodb shards clusters running at localhost:27012, localhost:27013, localhost:27014 and localhost:27015 respectively, we will go ahead and register these shards with our mongos query router, also define which database we need to shard and then enable sharding on the collection we are interested by providing the shard key. All these have to be carried out by connecting to the mongos query router as shown in the below commands:
mongo --port 27011 --host localhost
sh.addShard("localhost:27012")
sh.addShard("localhost:27013")
"localhost:27013")
sh.addShard("localhost:27014")
sh.addShard("localhost:27014")
sh.addShard("localhost:27015")
sh.addShard("localhost:27015")
I will be using the students database having collection grades. The structure of the documents in grades is given below:
{
"student_id" : 0,
"type" : "exam",
"score" : 54.6535436362647
}
"student_id" : 0,
"type" : "exam",
"score" : 54.6535436362647
}
sh.enableSharding("students")
sh.shardCollection("students.grades", {"student_id" : 1})
"students.grades", {"student_id" : 1})
In the sh.shardCollection we specify the collection and the field from the collection which is to be used as a shard key.
In the sh.shardCollection we specify the collection and the field from the collection which is to be used as a shard key.
Adding data to the mongodb sharded cluster
for ( i = 1; i < 10000; i++ ) {
db.grades.insert({student_id: i, type: "exam", score : Math.random() * 100 });
db.grades.insert({student_id: i, type: "quiz", score : Math.random() * 100 });
db.grades.insert({student_id: i, type: "homework", score : Math.random() * 100 });
}
Lets look at the status of the shards by connecting to the mongos. It can be achieved by using the sh.status() command.
mongo --port 27011 --host localhost
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5a2aadadeb346a72fb38f209")
}
shards:
{ "_id" : "rs1", "host" : "rs1/localhost:27012,localhost:27013", "state" : 1 }
{ "_id" : "rs2", "host" : "rs2/localhost:27014,localhost:27015", "state" : 1 }
active mongoses:
"3.4.7" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Balancer lock taken at Mon Jan 08 2018 12:04:06 GMT+0530 (India Standard Time) by ConfigServer:Balancer
Failed balancer rounds in last 5 attempts: 5
Last reported error: could not find host matching read preference { mode: "primary" } for set rs2
Time of Reported error: Mon Jan 01 2018 20:18:34 GMT+0530 (India Standard Time)
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "students", "primary" : "rs1", "partitioned" : true }
students.grades
shard key: { "student_id" : 1 }
unique: false
balancing: true
chunks:
rs1 5
rs2 4
{ "student_id" : { "$minKey" : 1 } } -->> { "student_id" : 1 } on : rs2 Timestamp(2, 0)
{ "student_id" : 1 } -->> { "student_id" : 6 } on : rs2 Timestamp(3, 0)
{ "student_id" : 6 } -->> { "student_id" : 38310 } on : rs2 Timestamp(4, 0)
{ "student_id" : 38310 } -->> { "student_id" : 82368 } on : rs2 Timestamp(5, 0)
{ "student_id" : 82368 } -->> { "student_id" : 120672 } on : rs1 Timestamp(5, 1)
{ "student_id" : 120672 } -->> { "student_id" : 159241 } on : rs1 Timestamp(3, 3)
{ "student_id" : 159241 } -->> { "student_id" : 197545 } on : rs1 Timestamp(4, 2)
{ "student_id" : 197545 } -->> { "student_id" : 241433 } on : rs1 Timestamp(4, 3)
{ "student_id" : 241433 } -->> { "student_id" : { "$maxKey" : 1 } } on : rs1 Timestamp(4, 4)
{ "_id" : "uts", "primary" : "rs2", "partitioned" : true }
uts.irctc
shard key: { "userid" : 1 }
unique: false
balancing: true
chunks:
rs1 1
rs2 2
{ "userid" : { "$minKey" : 1 } } -->> { "userid" : 1933 } on : rs2 Timestamp(2, 1)
{ "userid" : 1933 } -->> { "userid" : 8087 } on : rs2 Timestamp(1, 2)
{ "userid" : 8087 } -->> { "userid" : { "$maxKey" : 1 } } on : rs1 Timestamp(2, 0)
We can also connect to individual shards and query to find out the max and minimum student ids available.
mongo --host localhost --port 27012
>use students
>db.grades.find().sort({student_id : 1}).limit(1)
same thing u can do with others shard servers.
Lets execute the same set of queries on the mongos query router and see that the results this time will include data from all the shards and not just individual shard.
mongos --configdb rs0/localhost:27009,localhost:27010 --port 27011
> use students
> db.grades.find().sort({student_id:-1}).limit(1).pretty()
So this brings to the end of setting up sharded mongodb cluster on localhost. Hope this information will help others!
References: MOHAMED SANAULLA ;Treselle Engineering