Donnerstag, 13. November 2014

Code Coverage - Good or Evil?

Probably every serious software project has some kind of code coverage measure in place. Ideally within the Continuous Integration landscape. So you can see what code is executed by the automatic tests.

But why do have this feature? What is the real benefit and are there even downsides or risks? Let's have a closer look.

On the first glance, it sounds quite obvious. One look and you know to what degree your software is tested with automatic tests. The more the better. But is it that easy? But the coverage just states that the code was touched - NOT verified. And there is no indication what kind of test. I think the type of test makes a huge difference, there are e.g. ...

* Integration Tests (High Level) - If some code is executed by an integration test, the code seems to be at least relevant for that feature. But to what degree? Would the test fail if we comment out the code?

* Unit Tests (Low Level) - Since an Unit Test (at least a good one) covers much less code than an Integration Test, there is a good chance that the test really verifies the coding.

* "Coverage Tests" - That is what I call tests without any assertion. For example an Integration Test just "clicking" on buttons or calling a service API. Most times I have seen this on Unit Test level. It sounds like a stupid thing to do, but I think it can happen even without realizing it.

A little example: After fixing a bug, I wondered how this could have happened. I was confident that the functionality is covered by tests. Why was no test failing? I have realized that the assertion was not correct and there was actually no real way that the test could have failed.

That is why I have mixed feelings when it comes to code coverage. What you are really interested in is VERIFIED code not COVERED code.

For me, the most helpful aspect about coverage is the trend over time. It doesn't matter what the absolute numbers are, as long as the trend keeps going up, it is a good sign. Another aspect is that code coverage can be used to verify that your assumptions were right. For example if some code is not covered at all, but you thought it should be executed in some scenario, it might be worth a closer look.

But code coverage can have its downsides as well. Especially when people mix metrics and KPIs. Let's look at the difference:

* A 'metric' is just one dimension you are interested in from INSIDE to verify where you are standing with your project. Like already mentioned, this is especially useful when looking at the trend.

* A KPI (Key Performance Indicator) is used to assess the performance from OUTSIDE. This implies that fact that higher is better. And exactly that can be very misleading when talking about coverage. 85% is not in every case better than 80%. You have to understand where the coverage is coming from.

So from my point of view, coverage is a great and important metric, but a bad and sometimes even dangerous KPI.

Why dangerous?  When seeing coverage as a KPI, it is only a small step to setting a goal from outside.

Overall, code coverage is always very important to me to see where the project is standing in terms of tests. But I also think it is important not to look blindly at it.

In the last time, I get more and more used to see it from another angle. 80% code coverage sounds good in the first moment, but that still means that at least 20% of the codebase can be removed without any single test failing. When looking at it that way, it is even hard sometimes not to loose trust in automatic testing...