Why are local variables definitely assigned in unreachable statements?
You're probably all familiar with the feature of C# which disallows reading from a local variable before it has been "definitely assigned":
void M()
{
int x;
if (Q())
x = 123;
if (R())
Console.WriteLine(x); // illegal!
}
This is illegal because there is a path through the code which, if taken, results in the local variable being read from before it has been assigned; in this case, if Q() returns false and R() returns true.
The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values. (Though the C and C++ programming languages do not, and will cheerfully allow you to read garbage from an uninitialized local.) Rather, it is because the existence of such a code path is probably a bug, and we want to throw you in the pit of quality; you should have to work hard to write that bug.
The way in which the compiler determines if there is any path through the code which causes x to be read before it is written is quite interesting, but that's a subject for another day. The question I want to consider today is: why are local variables considered to be definitely assigned inside unreachable statements?
void M()
{
int x;
if (Q())
x = 123;
if (false)
Console.WriteLine(x); // legal!
}
First off, obviously the way I've described the feature immediately gives the intuition that this ought to be legal. Clearly there is no path through the code which results in the local variable being read before it is assigned. In fact, there is no path through the code that results in the local variable being read, period!
On the other hand: that code looks wrong. We do not allow syntax errors, or overload resolution errors, or convertibility errors, or any other kind of error, in an unreachable statement, so why should we allow definite assignment errors?
It's a subtle point, I admit. Here's the thing. You have to ask yourself "why is there unreachable code in the method in the first place?" Either that unreachable code is deliberate, or it is an error.
If it is an error, then something is deeply messed up here. The programmer did not intend the written control flow in the first place. It seems premature to guess at what the definite assignment errors are in the unreachable code, since the control flow that would be used to determine definite assignment state is wrong. We are going to give a warning about the unreachable code; the user can then notice the warning and fix the control flow. Once it is fixed, then we can consider whether there are definite assignment problems with the fixed control flow.
Now, why on earth would someone deliberately make unreachable code? It does in fact happen; actually it happens quite frequently when dealing with libraries made by another team that are not quite done yet:
// If we can resrov the frob into a glob, do that and then blorg the result.
// Even if the frob is not a glob, we know it is definitely a resrovable blob,
// so resrov it as a blob and then blorg the result. Finally, fribble
// the blorgable result, regardless of whether it was a glob or a blob.
void BlorgFrob(Frob frob)
{
IBlorgable blorgable;
// TODO: Glob.TryResrov has not been ported to C# yet.
if (false /* Glob.TryResrov(out blorgable, frob) */)
{
BlorgGlob(blorgable);
}
else
{
blorgable = Blob.Resrov(frob)
BlorgBlob(blorgable);
}
blorgable.Fribble(frob);
}
Should BlorgGlob(blorgable) be an error? It seems plausible that it should not be an error; after all, it's never going to read the local. But it is still nice that we get overload resolution errors reported inside the unreachable code, just in case there is something wrong there.
Comments
Anonymous
March 05, 2012
When I started using C#, I felt like I was being thrown into a pit of quality.Anonymous
March 05, 2012
I guess there's also a question of "how far" the compiler should look when it comes to determining whether or not something is unreachable - in the general case this requires solving the halting problem, so there needs to be some limit there. And while "I can't determine whether this is reachable or not" is fine for issuing a warning (there's no major consequence if you choose "I can't determine" instead of "definitely unreachable"), it's a whole lot less fine if your build fails when something isn't definitely-unreachable. Indeed. See http://blogs.msdn.com/b/ericlippert/archive/2011/02/24/never-say-never-part-two.aspx for some thoughts on reachability and the halting problem -- EricAnonymous
March 05, 2012
"Should BlorgGlob(blorgable) be an error?" If prolonged exposure to the VS11 Beta UI has reduced Eric to spouting this kind of gibberish on his blog, don't I have bigger things to worry about? :) We're still using VS10 on the Roslyn team. We start dogfooding VS11 soon. I'm reserving judgment until I actually use the thing! :-) -- EricAnonymous
March 05, 2012
The comment has been removedAnonymous
March 05, 2012
That method description is one of the most entertaining things I have read in a while.Anonymous
March 05, 2012
The comment has been removedAnonymous
March 05, 2012
Wasn't the spells named rezrov and blorb? If I remember Enchanter correctly...Anonymous
March 05, 2012
"If you can't explain it simply, you don't understand it well enough." You sir, deserve a pat on the shoulder for this nice work.Anonymous
March 05, 2012
So answer to "Why are local variables definitely assigned in unreachable statements?" is Because we (c# designer) think that you are messing up with something. But one question, then what is the use of having "default value" concept in c# programming for local variables.Anonymous
March 05, 2012
The comment has been removedAnonymous
March 05, 2012
Just the other day I changed something, and some unrelated code that I didn't touch started getting the definite assignment error. It was puzzling at first because I didn't touch the code that had an error, nor had I changed the semantics of the program. In the end I had changed something so that the compiler could no longer determine that the code was unreachable, so it started telling me about the error that had already existed in it. I was going to mention something about it, but apparently you have mental telepathy so I didn't have to. Thanks a lot!Anonymous
March 05, 2012
"Either that unreachable code is deliberate, or it is an error." Anything that is not deliberate is an error.Anonymous
March 06, 2012
The comment has been removedAnonymous
March 06, 2012
Thanks for an interesting post, Erik! The comments are simply too good to be true. I have often wondered about Microsoft's compilers (C, C++, C#) on this point. The example you give does produce a warning - which is often an error. However, the following example compiles with no warning: static void Main(string[] args) { var reachme = false; if (reachme) Console.WriteLine("Hallo world!"); } The code is unreachable at all times - just as in your example - so why doesn't this emit a warning? The Watcom compiler did that with code like this... Just a question spawned from your post.Anonymous
March 06, 2012
"We do in fact automatically initialize locals to their default values." Hello! Why do you do that, if we need to "definitely assign" a local to read from it?Anonymous
March 06, 2012
@Vlad I believe you need to set the Initialize locals flag to get verifiable code. This is a CLR level rule, whereas the definitely assigned stuff is a C# level rule. In many, but not all cases the just-in-time compiler will optimize out the unnecessary initialization.Anonymous
March 06, 2012
@ChristianBjerre: Your unreachable code doesn't try to access any not-definitely-assigned variables, so there is no warning. (Or are you asking why the compiler isn't more aggressive with its unreachable code analysis? Think generics.)Anonymous
March 07, 2012
Adding lots of static analysis to future VS editions can only help produce better code. For example, flagging local variables as 'Local variable X could be moved to inner scope' would be great.Anonymous
March 07, 2012
BTW I'd like to express how much I love the definite assignment requirement. It makes it very easy to verify that all code paths are assigning to a variable, like in a complex if/then/else/switch. All I do is not assign when I declare, and let the compiler tell me I'm okay. This is also fantastic for later maintenance, because the compiler will yell at you if you made a change that is almost certainly unintentional.Anonymous
March 08, 2012
@Raymond Chen: Yes, Raymond. That's my question. It's on the unreachable code analysis that I'm on to - spawned from Eric's article. A more aggressive code analysis will improve lots of code.Anonymous
March 09, 2012
@Christian Bjerre; Please remember that the unreachability-analysis has to be easily and precisely describable and implementable, because it's part of the standard. If you have a different analyzer following different rules for each implementation, you can never change the implementation without risking introducing new errors/warnings. BTW, that includes Microsoft enhancing their compiler for the next better version. Or you might be forced to downgrade. Ever worked on a second pc?Anonymous
March 09, 2012
@Deduplicator: This case is described in section "8.1 End points and reachability" of the standard. The same example is used as I did (didn't check before I wrote my example). The ability to analyse the code and point to code which is wrong according to a set of rules (that can extent the standard) is an important feature when you write code. Today we only have the Microsoft C# compiler to do this for us. Previously we has a set of C/C++ compilers implementing the same standard, but extending the code analysis with their own set of rules. We used this extensive back then and got better code as some rules were added as compiler specific warnings, but were warning about general problems with the code. Any changes to the specific compiler did cause some work, but in all cases increasing code quality. I'm not sure I understand your comment about the "second PC".Anonymous
March 27, 2012
Why can't we re-use or re-declare a variable previously defined in a child scope, for/foreach loop, etc.?! for (int i = 0; i < 5; i++) { } int i = 1; // no! i++; // no!!Anonymous
August 19, 2012
"We do in fact automatically initialize locals to their default values." Except when two distinct variables are merged into one by the compiler. The CLR only initializes that single variable once. The second C# variable will start with the value last found in the first C# variable.