Why MongoDB?
- Document-oriented
- Documents (objects) map nicely to programming language data types
- Embedded documents and arrays reduce need for joins
- Dynamically-typed (schemaless) for easy schema evolution
- No joins and no multi-document transactions for high performance and easy scalability
- High performance
- No joins and embedding makes reads and writes fast
- Indexes including indexing of keys from embedded documents and arrays
- Optional streaming writes (no acknowledgements)
- High availability
- Replicated servers with automatic master failover
- Easy scalability
- Automatic sharding (auto-partitioning of data across servers)
- Reads and writes are distributed over shards
- No joins or multi-document transactions make distributed queries easy and fast
- Eventually-consistent reads can be distributed over replicated servers
Mongo data model
- A Mongo system (see deployment above) holds a set of databases
- A database holds a set of collections
- A collection holds a set of documents
- A document is a set of fields
- A field is a key-value pair
- A key is a name (string)
- A value is a
- basic type like string, integer, float, timestamp, binary, etc.,
- a document, or
- an array of value
Mongo query language
- To retrieve certain documents from a db collection, you supply a query document containing the fields the desired documents should match. For example, {name: {first: 'John', last: 'Doe'}} will match all documents in the collection with name of John Doe. Likewise, {name.last: 'Doe'} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).
- Queries will also match inside embedded arrays. For example, {keywords: 'storage'} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: ['storage', 'DBMS']}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.
- If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.
Install MongoDB on Ubuntu 10.04
Configure Package Management System (APT)
The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Create a /etc/apt/sources.list.d/10gen.list file and include the following line for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
Now issue the following command to reload your repository:
sudo apt-get update
Install Packages
Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen
When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.
Configure MongoDB
These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.
This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.
Note
If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.
Controlling MongoDB
Starting MongoDB
You can start the mongod process by issuing the following command:
sudo service mongodb start
You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.
Stopping MongoDB
As needed, you may stop the mongod process by issuing the following command:
sudo service mongodb stop
Restarting MongoDB
You may restart the mongod process by issuing the following command:
sudo service mongodb restart
Controlling mongos
As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.
Using MongoDB
Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo
> show dbs (); —> To show your databases
> use <databasename> —-> To switch database
> db.createCollection(“collectionname”) —> To create collection
> db.collectionname.find(); —> To see the contents in the collection
> db.addUser(“theadmin”, “anadminpassword”) —> To create user and password
Mongodb performance test :-
To monitor database system we can use Mongotop
Mongotop tracks and reports the current read and write activity of a MongoDB instance.
Mongotop provides per-collection visibility into use.
Use mongotop to verify that activity and use match expectations.
Mongotop returns time values specified in milliseconds (ms.)
Mongotop only reports active namespaces or databases, depending on the –locks option.
If you don’t see a database or collection, it has received no recent activity.
By default mongotop connects to the MongoDB instance running on the localhost port 27017. However,mongotop can optionally connect to remote mongod instances
Next, we can use Mongostat
Mongostat captures and returns counters of database operations. Mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use Mongostat to understand the distribution of operation types and to inform capacity planning.
The Mongostat utility provides a quick overview of the status of a currently running mongod or Mongos instance. Mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and Mongos instances.
Use db.serverStatus()
It provides an overview of the database process’s state.
Then REST interface
MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017
These are a few basic tips on making your application better/faster/stronger without knowing anything about indexes or sharding.
Connecting
Connecting to the database is a (relatively) expensive operation. Try to minimize the number of times you connect and disconnect: use persistent connections or connection pooling (depending on your language).
there are some side effects with the PHP connection code.
$connection = new Mongo ( );
$connection->connect( );
In this code it appears the user wants to create a new connection. However, under the hood the following is happening:
The constructor connects to the database.
connect( ) sees that you’re already connected, assumes you want to reset the connection.
Disconnects from the database.
Connects again.
The result is that you have doubled your execution time.
ObjectIds
ObjectIds seem to be uncomfortable, so they convert their ObjectIds into strings. The problem is, an ObjectId takes up 12 bytes but its string representation takes up 29 bytes (almost two and a half times bigger).
Numbers vs. Strings
MongoDB is type-sensitive and it’s important to use the correct type: numbers for numeric values and strings for strings.
If you have large numbers and you save them as strings (“1234567890″ instead of 1234567890), MongoDB may slow down as it strcmps the entire length of the number instead of doing a quicker numeric comparison. Also, “12″ is going to be sorted as less than “9″, because MongoDB will use string, not numeric, comparison on the values. This can lead to some errors.
Driver-specific
Find out if you’re driver is particularly weaknesses (or strengths). For instance, the Perl driver is one of the fastest drivers, but it is not good at decoding Date types (Perl’s DateTime objects take a long time to create).
MongoDB adopts a documented-oriented format, so it is more similar to RDBMS than a key-value or column oriented format.
MongoDB operates on a memory base and places high performance above data scalability.Mongo DB uses BSON for data storage
Mongo uses memory mapped files, which means that a lot of the memory reported by tools such as top may not actually represent RAM usage. Check mem[“resident”], which tells you how much RAM Mongo is actually using.
“mem” : {
“resident” : 2,
“virtual” : 2396,
“supported” : true,
“mapped” : 0
},
Backup
There are basically two approaches to backing up a Mongo database:
Mongodump and Mongorestore are the classic approach. Dumps the contents of the database to files. The backup is stored in the same format as Mongo uses internally, so is very efficient. But it’s not a point-in-time snapshot.
To get a point-in-time snapshot, shut the database down, copy the disk files (e.g. with cp) and then start mongod up again. Alternatively, rather than shutting mongod down before making your point-in-time snapshot, you could just stop it from accepting writes:
> db._adminCommand({fsync: 1, lock: 1})
{
“info” : “now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock”,
“ok” : 1
}
To unlock the database again, you need to switch to the admin database and then unlock it
> use admin
switched to db admin
> db.$cmd.sys.unlock.findOne()
{ “ok” : 1, “info” : “unlock requested” }
Replication
Start your master and slave up like this:
$ mongod –master –oplogSize 500
$ mongod –slave –source localhost:27017 –port 3000 –dbpath /data/slave
When seeding a new slave server from master use the –fastsync option.
You can see what’s going on with these two commands:
> db.printReplicationInfo() # tells you how long your oplog will last
> db.printSlaveReplicationInfo() # tells you how far behind the slave is
If the slave isn’t keeping up,Check the mongo log for any recent errors. Try connecting with the mongo
console. Try running queries from the console to see if everything is working. Run the status commands
above to try and find out which database is taking up resources.
Timeout
Connection timeout in milliseconds. Defaults to 20000
Connection::query_timeout.
How many milliseconds to wait for a response from the server. Set to 30000 (30 seconds) by default. -1 waits forever (or until TCP times out, which is usually a long time).
Default pool
The default pool has a maximum of 10 connections per mongodb host. This value is controlled by the variable “connectionsPerHost” within the class
MongoDB Server Connections
The MongoDB server has a property called “maxConns” that is the max number of simultaneous connections. The
default number for maxConns is 80% of the available file descriptors for connections. One way to check the number of connections is by opening the mongo shell and executing:
>db.serverStatus() and in the previous mail I have send the screen shot of this.
The standard format of the MongoDB connection URI used to connect to a MongoDB database server.
mongodb://[username:password@]host1[:port1][,host2[:port2],…[,hostN[:portN]]][/[database][?options]]
Finding the Min and Max values in MongoDB
In MongoDB, the min() and max() functions work as limitors – essentially the same as “gte” (>=) and “lt” (<).
To find the highest (maximum) value in MongoDB, you can use this command;
db.thiscollection.find().sort({“thisfieldname”:-1}).limit(1)
This essentially sorts the data by the fieldname in decending and takes the first value.
The lowest (minimum) value can be determined in a similar way.
db.thiscollection.find().sort({“thisfieldname”:1}).limit(1)
Memory Mapped Storage Engine :-
This is the current storage engine for MongoDB, and it uses memory-mapped files for all disk I/O. Using this strategy, the operating system’s virtual memory manager is in charge of caching. This has several implications:
There is no redundancy between file system cache and database cache: they are one and the same.
MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
Virtual memory size and resident size will appear to be very large for the mongod process.
This is benign: virtual memory space will be just larger thanthe size of the datafiles open and mapped; resident size will vary depending on the amount of memory not used by other processes on the machine.
This command shows the memory usage information :- db.serverStatus().mem
For example :-
> db.serverStatus().mem
{
“bits” : 64,
“resident” : 31,
“virtual” : 146,
“supported” : true,
“mapped” : 0,
“mappedWithJournal” : 0
}
We can verify there is no memory leak in the mongod process by comparing the mem.virtual and mem.mapped values (these values are in megabytes). If you are running with journaling disabled, the difference should be relatively small compared to total RAM on the machine. If you are running with journaling enabled, compare mem.virtual to 2*mem.mapped. Also watch the delta over time; if it is increasing consistently, that could indicate a leak.
Also we can use to check what percent of memory is being used for memory mapped files by the free command:
Here 2652mb of memory is being used to memory map files
root@manager-desktop:~# free -tm
total used free shared buffers cached
Mem: 3962 3602 359 0 411 2652
-/+ buffers/cache: 538 3423
Swap: 1491 52 1439
Total: 5454 3655 1799
Garbage collection handling :-
When we remove an object from MongoDB collection, the space it occupied is not automatically garbage collected and new records are only appended to the end of data files, making them grow bigger and bigger.MongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted. This space is reused by MongoDB but never freed to the operating system.
To shrink the amount of physical space used by the datafiles themselves, by reclaiming deleted blocks, we must rebuild the database by using the command “db.repairDatabase( )” . repairDatabase copies all the database records to new files.
We will need enough free disk space to hold both the old and new database files while the repair is running, the repairDatabase will take a long time to complete.Also rather than compacting an entire database,
you can compact just a single collection by using “db.runCommand({compact:’collectionmname;})”
This does not shrink any datafiles,however; it only defragments deleted space so that larger objects might reuse it.
The compact command will never delete or shrink database files, and in general requires extra space to do its work.
Thus, it is not a good option when you are running critically low on disk space.