Why Stepping Through Code Trumps Unit Testing
Over the past few years, there has been a big buzz about how writing unit tests (preferably via Test Driven Development) combined with code coverage is the silver bullet for keeping bugs out of the product. While the benefits of unit testing are undeniable, there are quite a few folks out there who advocate unit testing as a replacement for stepping through code. In some cases I have even heard stepping through code described as "old school". I've always been uncomfortable with that position and I finally encountered a bug that exposes it for the utter nonsense that it is. Consider the following code:
const string RegValue1 = "SomeValue";
const string RegValue2 = "SomeOtherValue";
int bufferSize = 255 * Marshal.SizeOf(char);
StringBuilder buffer = new StringBuilder(bufferSize);
result = NativeMethods.RegQueryValueEx(hkey, RegValue1, IntPtr.Zero, IntPtr.Zero, buffer, ref bufferSize);
string value1= buffer.ToString();
buffer = new StringBuilder(bufferSize);
result = NativeMethods.RegQueryValueEx(hkey, RegValue2, IntPtr.Zero, IntPtr.Zero, buffer, ref bufferSize);
string value2= buffer.ToString();
buffer = new StringBuilder(bufferSize);
Were you able to spot the bug? I'll give you a hint: it will only occur if the string stored in the second registry value is larger than the first. If you still don't see it, don't worry. This bug is extremely difficult to spot if you are just looking at code. The basic problem stems from the design of the RegQueryValueEx API. The last parameter (lpcbData) ist he problem; it requires you to pass the byte count of the buffer in, but it then overwrites that value with the number of bytes actually written on return. In the code above, the buffer size is reused in multiple calls. Although it starts out as 1020 bytes, the first call to RegQueryValueEx will result in the value of bufferSize being changed to the number of bytes written to the first buffer. So on the second call to RegQueryValueEx, the buffer size being passed in will likely end up being less than the intended size of the buffer. If it happens that the second registry value is larger than the first, then the call will fail with ERROR_MORE_DATA because the passed in buffer size will be too small.
As a side note, that is why desgning an API with in/out parameters for anything but buffers is generally a bad design. Buffers are about the only place where the developer mind-set expects the output will be different than the input. In just about any other case, such a design practically guarantees that the caller will introduce this type of bug in their code because they will forget that their value got overwritten.
Back to topic: the truly scary thing about this bug is that there is no reason to think that the relative sizes of the registry values would have an effect on how this code behaves. As a result, it is highly unlikely that anyone would ever think to test the behavior where the length of the second registry value was greater than the first. Moreover, even if the developer did decide to write that test, they would have to correctly mock the behavior in RegQueryValueEx that causes the problem. Given that the developer mindset was clearly that they didn't care about the bytes written, it is unlikely that they would include that behavior in the mock--which means that their test results would be invalid. Which is why mocking system APIs is generally not a particularly good use of time. It should also be pointed out that if tests were written to exercise this code, we would see 100 percent code coverage even with inputs that would not find this bug. Which is a great reminder as to why code coverage numbers aren't as meaningful as some would make them.
The bottom line is that this bug is unlikely to be discovered through intentional testing. The best case scenario would be to find this bug accidentally by simply being lucky in the test values chosen. While accidents can happen, the most likely scenario for a developer relying solely on testing (unit or otherwise) is that this bug ships to customers.
In our case, that is exactly what happened. Fortunately, the bug involved in-house code so we were able to deploy the fix immediately. It should be noted that it took a year for this bug to surface, and when it did, it completely broke us. Which makes this type of bug the ultimate nightmare scenario.
So how do we prevent a bug like this from shipping? Stepping through code is the ultimate answer, but before we go there, let's talk about how the compiler is the first line of defense. In this case, bufferSize should have been declared as a constant because it is clearly not intended to be changed. If that had been done, the developer would have seen compiler errors because RegQueryValueEx doesn't accept a constant type for the by reference lpcbData parameter. Assuming the developer paid attention to the error (rather than just removing the const to get their code to compile), they would have quickly realized the problem and the bug would have never been coded in the first place.
But we're all human. Forgetting to declare buffSize as const would be a common enough mistake, as would removing it in response to the compiler error. The latter would result from the superficial analysis of "right, of course I can't pass a const by ref" while ignoring the implications of what by ref actually means. So that leaves stepping through code as the best line of defense against a bug like this. The problem with stepping through code is that developers either don't do it, or they don't understand how to do it correctly. Many developers do attempt to step through every line of code that they write, but they do so in a superficial way--whereby they only look at return values. With this bug, that doesn't cut it. So let's talk about how to correctly step through code.
The first issue with stepping through code is simply keeping track of where we have been so we can ensure we've stepped through every line. I've found that the best way to do this is to place break points on the first line of every code block. This includes not only functions, but every conditional or flow control statement. As we complete stepping through each block, we remove the breakpoint. When no breakpoints remain, we know we have stepped through each line of code. Unit tests can be helpful in this regard because they can enable us to enter code paths that we might not otherwise be able to enter normally. Alternatively, we can explicitly manipulate the instruction pointer and/or change values in the debugger to direct flow into a code path that we wish to step through.
While visiting every line of code is a good start, it is only half the battle. The other half is verifying that what each line actually does matches our original intent. Certainly, checking return values of functions is important, but checking inputs is equally important. If we only look at outputs, we are just guessing that the function behaved correctly. We are guessing because if we don't know what the inputs are, we have no basis for saying that the function did what we expected it to do.
Checking inputs would have resulted in this bug being caught because we would have seen that bufferSize
was not what we expected on the second call to RegQueryValueEx. The important thing to call out here is that proper stepping through code would have caught this bug *regardless* of whether the inputs were sufficient to actually cause the bug. This is the critical advantage with stepping through code--it allows us to find bugs that we didn't even know could exist (and therefore didn't think to test for).
As a side note, this is why embedding function calls as parameters makes life harder. When it comes time to step through code, the only way to verify the input is to evaluate the embedded function. This will usually require stepping in or (particularly if no sources are available for the embedded code) stepping through some disassembly. Storing the results of the embedded function in a variable and then passing that as a parameter is much easier to debug. In retail builds the compiler will optimize the generated assembly anyway, so we don't gain anything by attempting to optimize higher level code.
In addition to checking inputs it is also important to verify proper loop behavior. At a minimum, we should step through a few iterations of any loop to make sure our variables are behaving properly (such as loop counters). We should also step through code executing at the boundaries of our loop; i.e. step through execution where we are at the maximum possible iteration minus one, the maximum possible iteration, and then verify that the loop exits.
Conditional break-points are very useful here because they allow us to do things like break on the 1000th iteration of a loop.
Conditionals are another place where we have to be extra diligent. It is very important to verify both sides of a conditional. This is something that is easy to miss when "if" conditions lack an "else". In that case, I've found it helpful to add a breakpoint to the line following the conditional block and remove it when stepping through the case where the conditional is not satisfied.
The bottom line with stepping through code is that the goal ist to prove that each line behaves as intended. There is more to it than what I have mentioned above, but hopefully this will help get folks in the right mind-set to actually step through code in a meaningful way. When done properly, stepping through code is the best defense against shipping bugs. It is the only way to get the complete picture of how a given block of code actually behaves. Stepping through code is not "old school" and it is not optional.
Unit tests are great, but by themselves they are not sufficient to keep bugs at bay.
EDIT:
I got an e-mail from a customer suggesting that the "Why" in the title should be changed to "When". My decision to use "Why" was based on my point that having unit tests doesn't excuse us from stepping through code. However, I think the comment raises an important point. That is, why do we write unit tests? Those who claim that unit tests are a substitute for stepping through code are missing the boat on unit testing. Unit tests are not about finding bugs, they are about preventing them. A unit test is just an active form of an assertion. Assertions verify design assumptions in code and fire an alert when those design assumptions are violated. However, assertions are completely passive; they lie dormant until the code path in which they reside is actually executed. The benefit of assertions is that they are not tied to any specific set of actions, so they will verify design assumptions regardless of how the code path was entered. Assertions are powerful and should be used liberally. However, we can also benefit from a more deterministic approach.
Unit tests fufill the same function of validating design assumptions, but they do so in an active way. Unit tests actually force a code path to execute and verify the outcome. So by incorporating unit tests into our code we are able to define how the code should behave and prove definitively that it behaves that way. If changes are made that violate the design assumptions behind the code, a unit test will fail and alert us to the problem. Because unit tests are active, we can guarantee that our design assumptions are actually validated--rather than hoping we do the right things to exercise our assertions. So the primary reason for writing unit tests is to ensure that future changes to the code don't violate the original design assumptions and introduce new bugs.
One of the benefits of writing unit tests is that because we are thinking about testing, it forces us to consider scenarios that our code must handle that we might have ignored when we were writing it. So unit tests can help us find bugs in our existing code, but that is a secondary function. In fact, unit tests tend to give us a false sense of security--we think "hey, my code passes all of these tests, so it must be good." But what unit tests don't address are the unknowns--those tests that we didn't know we needed.
So I'm going to leave my title the way it is, but also acknowledge that in reality unit testing and stepping through code have fundamentally different (yet complementary) purposes. So in that sense, stepping through code only trumps unit testing when unit testing is being done for the wrong reasons.
Comments
Anonymous
August 04, 2008
> As a side note, that is why desgning an API with in/out parameters for anything but buffers is generally a bad design. Buffers are about the only place where the developer mind-set expects the output will be different than the input. In just about any other case, such a design practically guarantees that the caller will introduce this type of bug in their code because they will forget that their value got overwritten. Er... perhaps the fact that the argument is declared as "ref" - that is, [In,Out] managed pointer - would make the developer remember that the values are supposed to flow both ways here? One could say that the problem is rather that C# does not, to my knowledge, allow one to declare a parameter as a managed pointer with [In] specifier only, and so "ref" has to be abused in P/Invoke declarations, distorting its meaning.Anonymous
August 04, 2008
Yes, ref should clue you into the fact that the value can change, but IMO its easy to ignore. I think you are spot on in your assesment--most of the time when we see ref for P/Invoke declarations, its in the context of [In]. As a result, we are psychologically pre-disposed to think [In] rather than [In, Out] which makes it more likely that we would make this mistake. Thanks for that comment.Anonymous
August 15, 2008
It might be a "P/Invoke acquired" disease, really. :) We don't use it much on our team (mostly doing Web development), and for all our developers, out & ref have the proper .NET meaning, without the need to second-guess. I wonder, perhaps it would be a good reason to avoid ref & out in P/Invoke declarations entirely, and go for pointers proper? Yeah, this means "unsafe", but it's not like a P/Invoke code is going to get any "unsafer" with pointers over ref, anyway...