NoSQL: Document Store

Until recently I never had any practical experience with NoSQL, as I only ever worked with SQL databases. I knew the existence of the different types of databases, but never really played around with them. However, my current project seems the perfect candidate to try out some NoSQL things since I have to deal with large data objects/files that aren't easily mapped to a rigid structure.

First of all, I want to explain the structure of the data. When I say that it is not rigid, I don't mean that the content can be anything it wants. There still are rules, but you can have nesting as much as you want for instance. Mapping this to an SQL structure may be possible, but querying it will become difficult. The structure that is there, is more on a meta level, where you describe the rules of the structure and not of the content.

As you may know from some older blog posts, I don't like storing big data (especially JSON or text data) in a database column. I don't see the purpose and it will become difficult to do something useful with this data. A small side note may be that I discovered that there are possibilities with PostgreSQL to still query this data, which is something I will look into in the future and compare it with MongoDB.

For my project I have chosen MongoDB as a document store. It is a whole new world where querying happens completely different with the aggregate functions and there is a lot to learn. I haven't done much in-depth performance testing, and support for searching for data based on an entity (like with Hibernate) doesn't seem to be possible.

But I must admit that the small thing I have used MongoDB for so far has already enabled something that would not have been possible (or at least not easily) using a regular SQL database or reading the data from the JSON files directly.

I still have some concern however with the lack of linking, in that sense, where an SQL database has very strong linking and structure, it seems like MongoDB doesn't have any structure or linking at all. This is a bit of a double thing, on the one hand, I like the rules and restrictions of an SQL database to prevent wrong and invalid data being inserted. On the other hand, I feel that this causes all rules to be duplicated and cause more work to keep them in sync. The topic of actual useful linking versus duplication in an SQL database is a topic for another blog, and it may surprise you.

For now I haven't gone the approach yet to duplicate information and integrate them in the data in MongoDB. I don't know if I will ever really go down that path, I may just keep references with ids instead of really duplicating the data. However, in some cases my current data is already duplicated as changes should not be automatically reflected in other documents. Duplicating and integrating information more seems to be an optimization technique that you use to avoid doing extra look-ups. The downside of such an approach would be that updating information that is duplicated many times makes updating it much harder, which is also the main reason I wouldn't opt for this.

This was a short description of my first experiences with using MongoDB as a document store, which went pretty well, aside from the trouble of learning a completely new system to fetching data. I know there are other document stores, and even many other types of NoSQL databases. I don't think I will diversify much more in the upcoming future, but I might do some experiments with other databases just for the experience.

I know NoSQL databases are a hot topic, but I haven't seen them much being used on projects where I worked. I would love to hear from other people to see how they have come into contact with NoSQL databases, what they were used for, and whether or not standard SQL database is still a go to first choice for other people, as it is for me.