Legacy Codebase: Testing Approach
In the previous blog post I discussed the difficulties when faced with legacy code that does not contain any tests. Since we don't really want to change or add any code without having some automated tests for them, we should first write a test for the code.
If the code is nicely structured and is according to your liking, you should really just start writing tests. This can be a big investment, but since you do not plan on making big changes to the code, it is not wasted effort. This does however also mean investigating the behavior of all depending classes to see what their possible outcomes are to verify this class can deal with it properly.
If the class you want to test is too big and should be split up in multiple smaller classes with their own responsibilities, you generally can still start with writing tests for this class. The idea here is that the tests you write now can be easily modified later when you split the class. The API of this class will not change, the only thing that will change is that it starts to depend on these newly created classes. When you create these classes you have to make sure that you mock them in these tests. But since it are new classes, and the behavior should already be covered by these tests that should be straight forward. Most of the tests for the new classes will also just be parts of the tests you create now, be it with a different verify section. Testing large classes isn't fun, and it may not be worth it to completely test the entire class before doing the actual work. This is a trade-off you have to make, but at least make sure that the code you change or add is properly tested.
Up until now we always had code that was rather easy to test. When you are faced with code that contains a lot of dependencies it may be much harder to test it. I have been down a rabbit hole where I had to always mock one more thing to get it to work. The test would become very large and almost impossible to understand what it actually did. Note however that it is typically the first test that requires the most effort and as soon as you figure out what you have to mock for what reason, adding other tests becomes easier. An other alternative is to not write unit tests but instead go for integration tests. You can try to find a spot where you can easily mock something (database access for instance), and cut it off there. When you start splitting of parts to reduce the dependencies you can use your existing tests as a starting point. You will have to change the inputs to match with the new component, but the mocked parts can be re-used. The component from which you cut it off, you can now alter the test to mock this new component which you understand. By doing this, your initial test is split up in two tests for the separate components. As you clean up the split off component, understand the dependencies and how they work you can again split the test in two. One for this component, and another one for the dependency. Eventually you end up with small, clean tests for each component in your system.
This was a more detailed description of how I would go about testing legacy code, what the options are. Note that no matter what you do, testing a big pile of code isn't very fun. It can be a grind and there isn't any solution for that. By applying common sense however you can try to limit the effort required at one time and spread it out over time. This is something I will discuss more in the next blog post, where I focus more on the TDD part.