Struggling with unit tests (again)

I am having a mental hard time today because of unit tests. Not tests that fail... tests that don't exist. And tests I’m not sure that should exist.

It is very natural, right, and highly necessary to test methods which do a non-trivial transformation of data. Here is an example: given a list of events, output a list of the timespans between the events, excluding empty timespans.

The implementation of this method will do conditional tests, looping, and handle edge cases in the input such as an empty or single-item list. It not only feels right to test such code, it is also very, very easy to actually test such code as long as you do the appropriate factoring - so that test code can supply/mock out dependencies.

In my scenario this feels like mostly testing the implementation of the system. It is not a true end-to-end scenario, and while it helps satisfy some requirement of the system, and verifying its logic helps provide reassurance that the system overall meets its requirements, it isn’t meeting a requirement per se.

However, there’s a couple weird things I noticed while thinking about this.

I ended up writing and rewriting code a bit so that this code was easily testable, and found that I introduced new data classes that capture more specifically the input/output required of this particular piece of code. Which meant I wrote a little bit of glue code that runs before and after to squeeze data into the exact right shape to easily process, and then resqueeze it into a final output format.

I have the following doubts about what I just did:

· That glue code on the input side shouldn’t really be necessary – I should be able to get or process the input data in the exact right form in the first place

· That glue code on the output side is somewhat necessary, because it also has other functionality - it’s merging yet more data together adding them to the timespans, as it simultaneously puts things into a slightly different format required by another system. However, this could have been done without as many types into the picture, if I was generating the output type from the code under test.

· What I actually built was a pipeline – reformat data, process data, reformat data yet again, merge in more data. The goodness parts of my code, the bits of the application that MUST survive in some form no matter how I refactor it, and probably the best bits to test are the actual changes to the data that happen, not the ‘data contracts’ or ‘schemas of data’ in between those pieces of pipeline.

Where am I going with this? Testing the trivial reformatting of data from one input form to another intermediate form doesn’t feel that valuable. Testing more trivial reformatting of data from a second intermediate form to the final output form also doesn’t feel valuable. Yes there can be bugs in this code. But we’re testing something that doesn’t matter to the customer. It’s better to avoid writing such a test if you can write a test of the same code that also proves something that does matter to the customer.

Actually… the input form and output format of the data don’t actually matter to the customer. They are just conventions of code for dealing with a database and some another service… what matters is that actual data flows through the system.

OK. And now to the second part of my dilemma… I also have a manager asking me if I can get 100% code coverage and OK, if not, why not 80%?

So I feel caught in a pinch between two basic issues: 1) it is necessary to prove every single piece of my code is correct 2) most of my code is so arbitrary, that testing it can’t be said to provide end user value. In fact if done in the wrong way it will provide negative value, since it slows down my ability to change arbitrary code into different arbitrary code that solves some new issue.

How do I get out of this pickle?

Here is my initial idea of how to move forward:

1. write my first tests of some code I know matters

2. measure code coverage, and try to find some more code that isn’t covered at all, but definitely matters to final correctness

3. refactor that code to make it a testable independent piece that can, once tested survive unchanged forever – adding some glue code if necessary

4. figure out some way to remove the glue code? Or write some very stupid throwaway tests that prove this code is correct right now, but the test should probably be deleted as soon as the test fails.

I’ll let you know if that experiment turns up anything interesting…

Comments

  • Anonymous
    February 12, 2014
    Hi Tim, Here is my brain dump of thoughts.
  1. Glue code is not necessarily bad. In fact it might even indicate that the SUT is breaking the Single Responsibility Principle. Breaking up code like this also makes code cleaner and easier to maintain. This kind of code might also have other uses. For example I recently created an ApiConnector class which is injected into several clients of REST APIs from a third party provider. I could have the constructor inject value types, but a connector type that wraps them allows me to register the connector in an IoC container just once which will auto-inject it into the clients without me having to register each client in the container. If the clients had the config values in the constructor then I would have to manually register each client.
  2. Prefer to test application code than not test. Think of it in terms of 99% of bugs will come from the 10% of code that isn't covered by test automation. It is also a trap to think that simple conversion code doesn't need to be tested because it is small and simple. I have caught several silly bugs by covering them with tests.
  3. Cover the application code with acceptance/scenario tests even though the individual pieces are unit tested. This ensures that all the pieces work correctly together to satisfy the system requirements.
  4. Understand what code coverage is and isn't. It is often a misdirection for developers. I consider that code coverage only tells you what is not tested rather than what is tested. For example, a single unit test could cover a code block such as "if (x <= y)" to therefore give us 100% coverage. Code coverage here is misleading because there are at least three different unit tests that are required to cover all the scenarios that this single code block handle.
  • Anonymous
    February 12, 2014
    Personally, I tend to think that the main value the unit tests provide is not detecting bugs - that's the secondary (but still very important) value. The main value is this: "I ended up writing and rewriting code a bit so that this code was easily testable, and found that I introduced new data classes that capture more specifically the input/output required of this particular piece of code." I find this to happen again and again, whenever I introduce tests into a previously uncovered area: it forces me to change the design of the classes and makes them a lot better. Honestly, I can't find words to describe how hugely the design of those classes is forced to improve. I will confess to never care about code coverage so far - I rarely work on greenfield development so test coverage is abysmal - so I can't comment on the 100% requirement.

  • Anonymous
    February 13, 2014
    There might be an interesting E2E test there, if you can supply data in DB format and verify it in the service format. In an ideal world that falls nicely in your test suite and responsibility, though that's as messy in the real world as code theories. Then you could also throw known bad data or fuzz the pipline, which could turn up unexpected real bugs.