Database Noisy Neighbours
On a recent project we decided to completely overhaul our database system. The existing system used a separate database per customer which meant we had a lot of database connection pools that could not be re-used. Moreover, it complicated the logic to connect to the correct database. We wanted to go for a more unified and easier model by having all of our data in a single database. The story took a bit of an unexpected turn and caused us some headaches.
First of all, there were a lot of changes required to make it work, such as doing some mapping on the ids, adding a new customer field per entry and checking on it to prevent customers from seeing/requesting each others data. This was already tricky enough, but in the end we managed to find a solution that was ready for the final migration towards a single database.
I did not expect to see the database suffer so much from having all this data collapsed together. I assumed that a modern database would be able to handle huge amounts of data if the database model was made properly, the right indices were added and the queries were sound. But I guess my assumptions were a bit optimistic since we saw the performance of our application drop a lot. Before I go over all the reasons that are a factor in this drop in performance, I will first explain my reasoning why I didn't expect a big issue.
First of all, modern SQL databases are very performant and should be able to process thousands of entries within no time. While some of our tables contain a big amount of rows, nothing that really scared me, especially since in most cases you would only extract a fraction of those rows. Hence a good index would suffice to filter down all the data to a manageable amount.
Secondly, the current database setup, even though it had different databases for each customer, they were all located on the same database server. The main idea of having separate databases is that you can have them on different servers, as typically it is the I/O, CPU or memory usage that seeps over from one custom to another. This was not a problem in our current setup, so why would it be a problem in the single database setup? We still had the same disk, same CPU and memory available. If anything, I would think that our memory (and maybe even cpu) usage would go down as we don't need to keep all those connection pools available.
The only factor that we considered could cause issues is simply the size of the tables that has become much bigger, but as I mentioned before, by adding an index per customer, the database should be able to quickly reduce the table to the rows for that customer. When we went live with the single database, we did however quickly see degraded performance, and we quickly rushed to do some quick fixes to try to get performance back up to the previous level.
So what caused this to go so wrong? What did I not foresee? Well, there are a couple of factors that come into play. First of all, the current database was a MySQL 5.7 using the MYISAM engine, which you can't really call up-to-date, but due to circumstances, we can't update the version. Would a more recent version of MySQL have performed better? Maybe, would the InnoDB engine perform better? We have done tests with that, and nothing really indicated towards any better performance, we still don't understand exactly why this is, since InnoDB should be more advanced than MYISAM, but it is what it is. The fact is that MYISAM is very limited: it doesn't support transaction, doesn't support combining indices or allows you to do partitioning on your data, which does limited our options to improve performance.
Another problem with the MYISAM engine is that it can cause big table locks, it does this because it doesn't support transactions and have to guarantee consistency. A table lock is however a killer if all of your customers are working on the same table.
The database is however not the only culprit, our application (which is old and poorly written) is to blame as well. It often executes too many queries that are poorly written, which adds load on the database. This however did not change compared to the previous setup with split up databases per customer, but the impact of a poorly written query becomes much worse when you execute it on a big table.
With some fine-tuning of the MySQL settings, adding some missing indices and even rewriting some queries we managed to get the application to perform again at the same level as before the migration. But it is a fragile baseline, if one of our customers does an action that fetches a lot of data from the database, our entire application, all of our customers are impacted. This was not the case before the migration and is exactly what the noisy neighbour problem is all about.
But even after this painful experience, I would never design a multi-tenant system with a database per customer unless if there is a good reason for it. This project has too many things going against it to be used as an example against a single database system. I still believe that a modern database will not suffer this much from such a setup. The thing is, the single database setup has a lot of advantages as well. Our database migrations are much faster, as they only need to be executed on a single database instead of on each customer database separately. We did see a decrease in our used RAM because we don't need to have a connection pool per database (which most of the time is not being used). The available resources we have are now used more efficiently, the only bottleneck I still see is the disk we have on the server.
An interesting story is, that while we were doing the migration towards a single database, we have heard of another company doing the opposite. They went from a single database towards a per-customer database. It is impossible to tell who is right since our applications and use cases may be completely different. And who knows, maybe in 10 or 15 years, this project will be converted again into a per customer database application.