From what size of data MySQL is not performant any more?

Is this the right question?

Somebody asked me this question and I took the chance to write an article about that topic. Those kinds of questions are always lacking of context, but there might be a correct answer.

Advertisements

You might ask the wrong question here, because usually you don’t reach the performance limits of MySQL very easily. When there is a problem with performance, you blame the software without understanding. So you need to ask yourself the right questions, to get to the core of the problem:

Status quo – you run a DB (containing a few tables) with an unknown number of entries and a size less than 1GB: From my experience, you can’t reach any limit in MySQL that might have an effect on performance. So the problem must be somewhere else – lets go back to the understanding of MySQL.

(Disclaimer: Some of the explanations are not 100% accurate, because I only want to give a good overview and general understanding – if you have some input please leave your comment at the end of the article, thank you).

Understanding the basics

The understanding of the basic internal workflow in MySQL will help to understand how queries can cause performance issues.

MySQL has a number of great features to ensure it is delivering optimal performance. But like most other programs that run and store data on a server, it is dependent on hardware performance. I would like to compare MySQL for that purpose with a normal book.

MySQL is storing the data of each table in a single file, so every table in the DB can be compared to a book. The entries in the table would correlate to pages of that book. When a query is sent to the MySQL server, it is looking in the corresponding data files for matching entries. If we go back to our book example it would mean to start at the first page and go through the book and check each page for a match with the search criteria (the where clause). This is in general a very time consuming query. Both the authors of books and the MySQL developers as well have had a simple idea to reduce search time.

Search faster with an index

Book authors had the idea of creating a “table of contents” to make it easier to find things in a large book. The same basic concept is implemented in MySQL called “index”. If you search for a chapter by using the table of contents in a book, a few pages with the “table of contents” is enough to know a number of pages where the searched information can be found. The same method is used with an index in a MySQL table. The server is checking the index and is able to find the matching entries much faster. Additionally the index is kept (as far as possible) in the memory of the server, which is way faster than accessing the hard disks. Of course there are a number of other options that might influence the speed of the index and therefore the query speed, like when the index is too big to fit inside the RAM of the server. But I think you understood the overall concept.

The index size

While talking about index size, there are a few criteria should keep in mind. The index grows with the content of the indexed fields. It is again like the table of contents of a book. If the name of every chapter is one word long, every entry is just one line long. But if the chapter is a long sentence, the entry in the table of contents contains two or more lines describing that entry, which necessarily leads to a larger “table of contents” on more pages.

And, as a matter of course, both MySQL server and the reader of the book are faster in searching a fewer number of pages for a specific criteria. In short words this means: a smaller index value is faster to compare than a loooong string. Obviously it is more important to use indexes at all than trying to get the index as small as possible.

Using index but query is still slow?

If you have created an index and some queries are still slow, then you should check the queries. If you have for example queries with multiple where clauses, you should make sure that ALL of the columns of the where clause are part of an index. Otherwise this will slow down the query.

How do I find slow queries?

MySQL provides a functionality to find slow queries; this can be configured in the my.conf MySQL configuration file. The “The Slow Query Log” can be enabled with a time limit. Enabling this feature will log every query, which exceeds the configured time limit to the slow query log file. Nevertheless as from my operational experiences, this can cause some misleading log entries in some special cases showing queries that are not slow as well.

When you run a very slow query, MySQL server is attempting to execute it as fast as possible, but may need some time to finish it. Let us assume that a query will take at least 30 seconds due to a missing index. When this query is running, all other queries will slow down as well. When are now operating in an environment, where we constantly get a few requests per second to the MySQL database, what will happen? All requests coming into MySQL will slow down due to the very slow query. Now more incoming queries will blow up the execution queue and the execution time of all queries is exceeded until the very long query has been finished. In such a situation MySQL server is starting to report every query as a “slow queries” as they might hit the configured time limit for slow queries. Of course the real reason for this situation will be logged as well. But it might be a little bit tricky to find the important information between all the other misleading pseudo “slow queries”.

Of course this is a worst case scenario, but it shows how a few very slow queries can have a negative impact to the overall performance of a MySQL database. This worst case scenario is by the way is not a fictive scenario. I experienced such a situation with about 4GB of data in table of about 100 Million entries – in this case a query with a where-clause to a non-indexed field was running for 30 minutes. During this time all other queries where queuing up until the maximum number of connections of MySQL server was reached. As a result of that new connections and therefore new queries have been rejected.

MySQL profiler

There are a number of MySQL profiler available that show better results. Those profilers using all the same information about queries and query execution time a MySQL server can deliver. But the way this is used to optimize queries is different and can force MySQL server to deliver better performance results.

One profiler I would like to recommendJet Profiler for MySQL“. Besides much other functionality it can provide a list of all queries sorted by total execution time. In opposite to the slow query log, the queries are sorted by consolidated execution time. This means, that the values are removed from the actual query and then all identical queries are summed up. This provides a very good ranking of queries using the most time in the MySQL server.

Conclusion

What does all that has to do with the initial question? Well, it teaches us that MySQL server in most cases can provide much more power than we think. And in cases like this, it is not a performance problem of MySQL server because of too much data. In opposite, it shows us that database and table layout is way more important. We should try to get an understanding of how MySQL server works and how to optimize our tables and queries to get the full power of MySQL for Applications!


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Database and tagged , , , . Bookmark the permalink.