Monday, January 8, 2018

Mongo DB Sharding on Localhost Windows

I have been playing around with MongoDb with Sharding, These NoSQL datastores are gaining popularity due ease of use and they can be scaled out i.e horizontal scaling. This horizontal scaling in MongoDB can be achieved by creating a sharded cluster of mongodb instances.

The MongoDB reference has a very clear explanation for the same here.

A sharded setup of mongodb requires the following:

    • Mongodb Configuration server – this stores the cluster’s metadata
    • mongos instance – this is the router and routes the queries to different shards based on the sharding key
    • Individual mongodb instances – these act as the shards.
  • Lets create all of the above components in a single instance i.e on your localhost.
  • Create directories for db path and log
  • 2 config server, 2 set of shard
          C:\MongoDB\shard\configdb0,configdb1
          C:\MongoDB\shard\sdata1_1,sdata1_2,sdata2_1,sdata2_2



For Config Server execute the below steps:

  1. mongod --configsvr --replSet rs0 --port 27009 --dbpath C:\MongoDB\shard\configdb0 --logpath "C:\MongoDB\shard\logs\conf0.log"

  2. mongod --configsvr --replSet rs0 --port 27010 --dbpath C:\MongoDB\shard\configdb1 --logpath "C:\MongoDB\shard\logs\conf1.log"

  3. mongo -host localhost -port 27009

  4. rs.initiate(
      {
        _id: "rs0",
        configsvr: true,
        members: [{ _id : 0, host : "localhost:27009" },
          { _id : 1, host : "localhost:27010" }]
      }
    )

For making shard servers execute below steps:

  1. mongod --shardsvr --port 27012 --replSet rs1 --logpath "C:\MongoDB\shard\logs\shard1_1.log" --logappend --dbpath "C:\MongoDB\shard\sdata1_1"

  2. mongod --shardsvr --port 27013 --replSet rs1 --logpath "C:\MongoDB\shard\logs\shard1_2.log" --logappend --dbpath "C:\MongoDB\shard\sdata1_2"

  3. mongo -host localhost -port 27012

  4. rs.initiate(
      {
        _id: "rs1",
        members: [{ _id : 0, host : "localhost:27012" },
          { _id : 1, host : "localhost:27013" }]
      }
    )

  5. mongod --shardsvr --port 27014 --replSet rs2 --logpath "C:\MongoDB\shard\logs\shard2_1.log" --logappend --dbpath "C:\MongoDB\shard\sdata2_1"

  6. mongod --shardsvr --port 27015 --replSet rs2 --logpath "C:\MongoDB\shard\logs\shard2_2.log" --logappend --dbpath "C:\MongoDB\shard\sdata2_2"

  7. mongo -host localhost -port 27014

  8. rs.initiate(
      {
        _id: "rs2",
        members: [{ _id : 0, host : "localhost:27014" },
          { _id : 1, host : "localhost:27015" }]
      }
    )

Now Registering the shards with mongos

Now that we have created our two mongodb shards clusters running at localhost:27012, localhost:27013, localhost:27014 and localhost:27015 respectively, we will go ahead and register these shards with our mongos query router, also define which database we need to shard and then enable sharding on the collection we are interested by providing the shard key. All these have to be carried out by connecting to the mongos query router as shown in the below commands:

mongo --port 27011 --host localhost

sh.addShard("localhost:27012")

sh.addShard("localhost:27013")

sh.addShard("localhost:27014")

sh.addShard("localhost:27015")


I will be using the students database having collection grades. The structure of the documents in grades is given below:


  "student_id" : 0,
  "type" : "exam",
  "score" : 54.6535436362647
}

sh.enableSharding("students")

sh.shardCollection("students.grades", {"student_id" : 1})

In the sh.shardCollection we specify the collection and the field from the collection which is to be used as a shard key.

Adding data to the mongodb sharded cluster

for ( i = 1; i < 10000; i++ ) {
  db.grades.insert({student_id: i, type: "exam", score : Math.random() * 100 });
  db.grades.insert({student_id: i, type: "quiz", score : Math.random() * 100 });
  db.grades.insert({student_id: i, type: "homework", score : Math.random() * 100 });
}

Lets look at the status of the shards by connecting to the mongos. It can be achieved by using the sh.status() command.
mongo --port 27011 --host localhost 
mongos>  sh.status()  
                                                                                                        
--- Sharding Status ---                                                                                                        
  sharding version: {                                                                                                          
        "_id" : 1,                                                                                                             
        "minCompatibleVersion" : 5,                                                                                            
        "currentVersion" : 6,                                                                                                  
        "clusterId" : ObjectId("5a2aadadeb346a72fb38f209")                                                                     
}                                                                                                                              
  shards:                                                                                                                      
        {  "_id" : "rs1",  "host" : "rs1/localhost:27012,localhost:27013",  "state" : 1 }                                      
        {  "_id" : "rs2",  "host" : "rs2/localhost:27014,localhost:27015",  "state" : 1 }                                      
  active mongoses:                                                                                                             
        "3.4.7" : 1                                                                                                            
 autosplit:                                                                                                                    
        Currently enabled: yes                                                                                                 
  balancer:                                                                                                                    
        Currently enabled:  yes                                                                                                
        Currently running:  no                                                                                                 
                Balancer lock taken at Mon Jan 08 2018 12:04:06 GMT+0530 (India Standard Time) by ConfigServer:Balancer        
        Failed balancer rounds in last 5 attempts:  5                                                                          
        Last reported error:  could not find host matching read preference { mode: "primary" } for set rs2                     
        Time of Reported error:  Mon Jan 01 2018 20:18:34 GMT+0530 (India Standard Time)                                       
        Migration Results for the last 24 hours:                                                                               
                No recent migrations                                                                                           
  databases:                                                                                                                   
        {  "_id" : "students",  "primary" : "rs1",  "partitioned" : true }                                                     
                students.grades                                                                                                
                        shard key: { "student_id" : 1 }                                                                        
                        unique: false                                                                                          
                        balancing: true                                                                                        
                        chunks:                                                                                                
                                rs1     5                                                                                      
                                rs2     4                                                                                      
                        { "student_id" : { "$minKey" : 1 } } -->> { "student_id" : 1 } on : rs2 Timestamp(2, 0)                
                        { "student_id" : 1 } -->> { "student_id" : 6 } on : rs2 Timestamp(3, 0)                                
                        { "student_id" : 6 } -->> { "student_id" : 38310 } on : rs2 Timestamp(4, 0)                            
                        { "student_id" : 38310 } -->> { "student_id" : 82368 } on : rs2 Timestamp(5, 0)                        
                        { "student_id" : 82368 } -->> { "student_id" : 120672 } on : rs1 Timestamp(5, 1)                       
                        { "student_id" : 120672 } -->> { "student_id" : 159241 } on : rs1 Timestamp(3, 3)                      
                        { "student_id" : 159241 } -->> { "student_id" : 197545 } on : rs1 Timestamp(4, 2)                      
                        { "student_id" : 197545 } -->> { "student_id" : 241433 } on : rs1 Timestamp(4, 3)                      
                        { "student_id" : 241433 } -->> { "student_id" : { "$maxKey" : 1 } } on : rs1 Timestamp(4, 4)           
        {  "_id" : "uts",  "primary" : "rs2",  "partitioned" : true }                                                          
                uts.irctc                                                                                                      
                        shard key: { "userid" : 1 }                                                                            
                        unique: false                                                                                          
                        balancing: true                                                                                        
                        chunks:                                                                                                
                                rs1     1                                                                                      
                                rs2     2                                                                                      
                        { "userid" : { "$minKey" : 1 } } -->> { "userid" : 1933 } on : rs2 Timestamp(2, 1)                     
                        { "userid" : 1933 } -->> { "userid" : 8087 } on : rs2 Timestamp(1, 2)                                  
                        { "userid" : 8087 } -->> { "userid" : { "$maxKey" : 1 } } on : rs1 Timestamp(2, 0)                     

We can also connect to individual shards and query to find out the max and minimum student ids available.
mongo --host localhost --port 27012
>use students
>db.grades.find().sort({student_id : 1}).limit(1)

same thing u can do with others shard servers.
Lets execute the same set of queries on the mongos query router and see that the results this time will include data from all the shards and not just individual shard.

mongos --configdb rs0/localhost:27009,localhost:27010 --port 27011

> use students
> db.grades.find().sort({student_id:-1}).limit(1).pretty()

So this brings to the end of setting up sharded mongodb cluster on localhost. Hope this information will help others!