Is MongoDB always faster then MySQL?

This is another classic question you might have come across, but can this question simply be answered with a YES or NO? Well, I would not write about it if the answer was that easy!

MongoDB has a lot of fans, many of them amazed with its fancy features and its speed. The question is not so stupid at all, as MongoDB is promoted as being amazingly fast and way faster and better than MySQL, but still, real life is not that simple.

Advertisements

Let’s have a quick look into the internal workings of these database engines to answer the question.

How does MySQL process queries?

MySQL is based on quite an old database concept. That’s why it has some issues to deal with. When you think about the history of MySQL, it might explain why details in implementations are done as they are. That said, MySQL also makes use of modern techniques, like using multiple threads to boost performance; when multiple queries need to be processed at once, several threads are used to work on the queries in parallel. So if we have a high load situation with a large number of queries, MySQL processes some number of those queries simultaneously. To avoid collisions while a query is writing data, MySQL uses a technique called locks to block write access and protect against alternating write access to the same entry. Depending on the storage engine you use, a mechanism called row-locking may be used. With row-locking, if a query writes data to an row, the locking mechanism will block other write access to the same row of the table until the first write process has been finished. At the same time, other queries can change data in other rows of this table, but not the one being currently written.

How does MongoDB process queries?

The internal concept of MongoDB is completely different to that of MySQL. In comparing them here, I would like to point out one important difference and ignore the others: in MongoDB queries are not processed simultaneously. This means that all the write queries that are sent to a MongoDB server are queued. The server processes each query one by one (at the moment of writing, it makes no difference if the queries are sent to the same collection [in SQL terms table], database or not). MongoDB uses a technique called instance-wide-locking. This means that the entire server process can only run one write query at a time. To remove this bottle-neck a little bit, MongoDB is currently switching over to database-wide-locks, as mentioned in the latest release notes.

So what does this imply? A instance-wide-lock allows the entire MongoDB server to process only one write query at time. A database-wide-lock allows MongoDB to process one write query per database, but it is able to use multiple databases simultaneously. That might sound horrible to you, but once you remember that queries in MongoDB are processed a lot faster than in MySQL you’ll see that this is not a massive problem. Of course, all of this only applies if you setup the database and collections correctly!

How does MongoDB compare to MySQL?

In terms of performance of a single simple query, MongoDB will be a lot faster than MySQL. In my experience, you will notice differences under heavy load situations in some special situations. In MySQL all competent database administrators should be aware of the importance of indexes. Compared to MongoDB it seems that most administrators don’t even know that something like indexes exist! That might come from the fact, that MongoDB is a structureless database, so why should I care? But I want to point out here that proper index usage is at least as important as with other database engines.

How important are indexes in MongoDB?

As MongoDB uses a instance-wide-lock or database-wide-lock for write queries, it may be much more important to set indexes properly than it is in MySQL. Let me give you an example of my experiences, in this case let us assume we are still using instance-wide-locking.

Imagine you have a table with a few million entries, each entry a few fields long in both a MongoDB database and a MySQL database. There are currently no indexes defined. If a query is executed to update approximately 100 of the entries according to a specific field value in one of the fields, what is going to happen?

MySQL will start processing the query in one thread and as a matter of course it has to perform a “full table scan” to search all the millions of entries to find those rows to be updated. Maybe this query will take about 20 minutes in our imagined environment.

MongoDB will run the same query, in let’s say, about 5 minutes. This sound great, doesn’t it? In fact, the MongoDB server has to scan through the whole collection (in SQL terms table) to find the related entries for updating. As discussed before, during these 5 minutes MongoDB will lock the entire server process for writing access. You may think 5 minutes still sounds way faster, doesn’t it? So is MongoDB better because it is faster?

The times of a single query compared to the overall impact on the system are two different things. MySQL still processes other write queries to the database server, even to the same database or table. While the server is busy with the “long running query”, resources are used and not free for other queries, and so those other queries may run slightly slower, but they are still processed.

But what happens in MongoDB might come as an eye opener. While the 5 minute query is running on MongoDB, a instance-wide-lock is active. This will cause all other write queries to be queued, without exception. They all have to wait for the 5 minute query to finish and then they will be executed one by one. This also is true when the underlying connection has already given up, due to a timeout or any other reason. I guess, beside other reasons, the main reason for the lock here is consistency. This long query execution time of, in our example, 5 minutes for a single query can cause timeouts in your application. For example, a PHP script has a timeout after which the script is terminated, and especially for web applications, the web-browser has a much shorter timeout than 5 minutes after which it takes the connection to the server as being broken.

As mentioned, queries that are queued, even if the connection from the client to the MongoDB server is already closed, remain queued and will be executed in the order that they arrived.

On both systems, properly defined indexes can massively reduce query execution time.

Conclusion

Even if MongoDB performs a lot faster on single simple queries; this doesn’t mean it is faster all the time, and even structure-less databases like this need some kind of organization; defining of indexes is a very important task – missing indexes can have a huge impact, even if the execution time of each query is shorter than with MySQL.

As the MongoDB development road map  shows, the described issue is something the developers are aware of, and the move from instance-wide-locking to database-wide-locking is a big step in the right direction. Hopefully more useful features are coming soon.

And for anyone who thinks that there can be a one-size-fits-all database engine that is the best tool for any application, let me state one thing: There is a proper tool for every job and for every tool there is a right place to use it. This applies to database engines just as well as any other tool, software or not!


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Database and tagged , , , , . Bookmark the permalink.