Multi-Tenant SQL Queries
In the world of cloud applications, having a multi-tenant application is standard. While there are multiple options to implement multi-tenancy regarding the database/storage approach, having a single database where all data for all tenants resides is pretty common. When choosing this approach, you have to make sure that you don't leak information of different tenants, which is what I will explore in this blog post.
If you have a single database (or even a separate database server) per tenant, or simply don't have a multi-tenant application, fetching data is easy. You simply get all the information, either with some filtering or not and that is all information you can return to the user. In a multi-tenant application you have to keep the tenant into account in order to not show information that belongs to a different tenant. There are 2 solutions for this:
Keep your queries tenant-unaware and do some post-processing.
Make your queries tenant-aware.
The first solution aims to keep your queries the same as if the application was not multi-tenant. The advantage of this is that your queries and thus your method signature of the repository remains simple and can focus on the actual functionality as it doesn't have to take the tenant into account. The main reason why this becomes nearly unusable is that it doesn't scale with the amount of tenants. Imagine you have 10 tenants, fetching a list of resources will return resources of all 10 tenants. In your application logic you now have to manually filter out all of results for the other tenants. It speaks for itself that this is a waste or resources.
Another problems arises when you have pagination and thus you only return a subset of all results. Post filtering would mean that your page will not be completely filled. To fix this your application logic would have to keep querying your database until a full page (or the end) has been reached. Needless to say that this also poses a serious overhead.
A much better approach is to have your queries filter on the tenant by itself. This however means that you have to pass the actual tenant to your repository. The usage of the SecurityHolderContext is sometimes used for this, but as I mentioned in my previous blog post, I am not a fan of this. Although it makes the repository method signatures look like a simple single-tenant application, it is actually this hiding/obscuring of the multi-tenant reality that bothers me. Instead, I prefer to have all of my methods take in an extra tenant id parameter. Another advantage of this, is that you can now have queries that work for a single tenant (for your customer facing application) and queries that work across all tenants (for a back office or monitoring application).
It is important to note that when using queries that filter on tenant id, having an index on the tenant id column is a must. Another remark is that to keep the API consistent, I even pass the tenant id to queries that technically don't need it. For instance fetching a resource based on id. Since this will always return at most 1 result, you might think that there is no benefit in doing the extra filter on tenant id, but then you would be forgetting about the security aspect. Just because a request for a resource with a certain id is made, doesn't mean that the user has access to it. While this is very likely, you should always verify this, and it makes sense to have this verification done in the query itself. Since the resource does not exist from the user's POV, returning a 404 is also better than a 403, as the latter will leak information about the existence of the resource. This makes it that filtering in the database/query is not only viable but even preferred.