Danger, Will Robinson!

[アーティクル]
03/03/2011

As long-time readers of this blog know, I am often asked why a particular hunk of bad-smelling code does not produce a compiler warning.

"Why not?" questions are inherently hard to answer because they turn causation on its head; normally we ask what caused a particular thing to happen, not what caused a particular thing to not happen. Therefore, rather than attack that question directly I like to rephrase the question into questions about the proposed feature. (A warning is of course a feature like any other.) By answering these questions we can see whether the compiler team is likely to spend its limited budget on the feature rather than prioritizing something else. Some questions I think about when considering "why did you not implement this warning?" are:

Did someone think of it?

If no one on the compiler team ever thinks of the possibility of a particular warning then obviously it won't happen. As an example of that, I was recently asked why:

using(null) statement;

does not produce a warning. That code is legal; it is equivalent to

IDisposable temp = null;
try
{
statement;
}
finally
{
if (temp != null) temp.Dispose();
}

Of course the code is completely pointless; all it does is introduce a try-finally block that adds nothing useful to the program. Why don't we give a warning for this useless code? Well, the primary reason is that (to my knowledge) no one on the compiler team ever thought that a customer would do such a thing, and therefore, never thought to design, implement, test and document a compiler feature to deal with the situation.

Now that we have thought of it, we can see if it meets any of our other criteria, like, is it plausible that a user would type this accidentally thinking that it does something sensible? But I'm getting ahead of myself.

Is the proposed warning even possible? Is it really expensive?

Sometimes people propose warnings that would require solving the Halting Problem (which is unsolvable in general) or solving an NP-hard problem (which is in practice unsolvable in a reasonable amount of time). Proposed warnings like "warn me if there is no possible way for this overload to be chosen" or "warn me if there is an input which guarantees that this recursive method has unbounded recursion" or "warn me if this argument can ever be null in this program" require a level of analysis that is either in principle impossible, or possible but better performed by tools custom made for that purpose, like the Code Contracts engine. Warning scenarios should be cheap for the compiler to disambiguate from non-warning scenarios.

Is the code being warned about both plausible and likely to be wrong? Is the potentially "wrong" code actually sensible in some scenario?

There are an infinite number of ways to write completely crazy code; the point of warnings is not to identify all possible crazy code, but rather to identify code that is plausible but likely to be wrong. We don't want to waste time and effort implementing features warning about ridiculous scenarios that in practice never arise.

Sometimes code seems to be both plausible and wrong, but is intentional for a reason that is not immediately obvious. The best example of this is our suppression of warnings about write-only locals; a local might be read by the developer during debugging even if it is never read in the program.

Machine-generated code is also often crazy-seeming but correct and intentional. It can be hard enough to write code generators that generate legal code; writing code generators that change their output when they determine that they're generating code that could produce warnings is burdensome. (For example, consider the machine-generated code that you get when you create a new project in Visual Studio. It had better compile without warnings!) However, we are much more concerned with catching errors in code written by humans than we are about making machines happy.

Obviously we only want to warn about code that has a high likelihood of actually being wrong. If we warn about correct code then we are encouraging users to change correct code, probably by changing it into incorrect code. A warning should be a bug-preventing feature, not a bug-encouraging feature. This leads me to my next point:

Is there a clear way to rewrite correct code that gets an unwanted warning into correct code that does not get a warning?

Warnings ideally ought to be easily turn-off-able. For example:

bool x = N();
...
if (x == null ) Q();

That gives a warning that == null on a non-nullable value type is always false. Clearly you can rewrite the code to eliminate the warning; if you intended it to always be false then get rid of the whole statement, or you can turn the variable into a nullable bool, or whatever.

A warning where there is no way to write the code so that the warning disappears is very irritating to people who actually do mean the seemingly-wrong thing. You can always turn a warning off with a #pragma of course, but that's a horribly ugly thing to force someone to do.

Will a new warning turn large amounts of existing code into errors?

Lots of people compile with "warnings as errors" turned on. Every time we add a warning that causes a lot of existing code to produce warnings, we potentially break a lot of people. Even if the warnings are good ones, breaking a lot of people is points against doing the feature. Such warnings are perhaps best handled in another tool, which brings me to:

Does the compiler team have to do this work? Or can another tool produce the warning?

Microsoft provides tools like FxCop, StyleCop and Code Contracts to do far more extensive analysis of code than the C# compiler performs. Similarly with third-party tools like ReSharper. If a warning can be implemented by one of those tools just as easily or more easily than it can be implemented in the compiler then that's less work for me, and that's awesome. Also, since FxCop examines compiled state, a warning in FxCop can catch problems in C# and VB, without having to change either compiler.

Will the user discover the error immediately upon testing anyway?

The earlier a bug is discovered, the cheaper it is to fix. The opportunities to find bugs are, in order of exponentially increasing cost of fixing the bug:

* When you type in the code
* When you compile the code
* When you test the code
* When your customer runs the code

Warnings about potential bugs that pop up while you are typing the code or compiling the code are bugs that never make it to the customer. However, a potential warning that points out a bug that would always be caught by testing is less compelling than a warning for a bug that could make it through testing to the customer.

For example, I was recently asked why the compiler does not provide a warning for this common typing error:

private decimal cost;
public decimal Cost { get { return this.Cost; } }

Whoops, that's a stack overflow exception (or, on machines with tail recursion optimizations, an infinite loop) waiting to happen right there. The compiler can in principle determine cheaply that property getter does nothing other than call itself, so why doesn't the compiler warn about it? We could do all the work of identifying code that has this pattern, but why bother warning about a bug that will be caught anyway? The instant you test this code you will immediately discover the problem and it will be clear how to fix it. This is unlike our earlier example of "if (x == null)", where the fact that there is unreachable code might go completely unnoticed during testing, particularly if the common scenario is for x to be false.

Summing up:

Since that is quite the gauntlet for a feature to run, we find that usually the right thing to do for a proposed warning is "don't add the warning". The most likely situation for lots of new warnings to be added is when the warnings are surrounding a new feature that has new ways of being misused; we added quite a few warnings in C# 4.0 for misuses of "dynamic", "no PIA", and "named and optional arguments". Warnings for new features have no existing code using them that would break.

That said, we can always improve; if you have ideas for warnings do keep them coming.

Comments

Anonymous
March 03, 2011
The comment has been removed
Anonymous
March 03, 2011
I can think of legitimate use cases for 'using(null)' - a conditional. e.g. using( ispathRemote ? new TemporaryPathPermission(path) : null)
{
// perform operation on path
} Without 'null' being a legitimate argument, this kind of conditional logic becomes much more difficult to express. Sure, that's a perfectly reasonable expression to put in a using block. But just plain "using(null)" is pointless. -- Eric
Anonymous
March 03, 2011
The comment has been removed
Anonymous
March 03, 2011
"Developers who are so unprofessional as to deploy code without running it also ignore compiler warnings. -- Eric" Sadly, yes, that's an excellent point.
Anonymous
March 03, 2011
"Will the user discover the error immediately upon testing anyway?" I'd rather find out about the warning before I start testing. It takes more work to fix a warning if it makes it into testing. So, warnings that save me testing effort do have value, albeit far less value than warnings which prevent my users from experiencing issues.
Anonymous
March 03, 2011
The comment has been removed
Anonymous
March 03, 2011
>>> bool x = N(); if (x = M()) Q(); That gives a warning that you probably meant "==", not "=". <<< Since when? In fact, one of the things I thought was nice about the lack of implicit conversions for integer types to bool was that it was much less likely to have the classic "assignment in 'if'" bug. So the compiler keeps quiet on assignments in 'if' statements. Is there some scenario in which your statement is actually true? I was misremembering; serves me right for not running the code through the compiler. -- Eric
Anonymous
March 03, 2011
I know this might sound silly, but one such program that might use this is a compiler unit test. Let's say I want to create a new C# compiler, called (with reverence to Mono), "Poly". If I write a unit test to verify that my compiler doesn't produce crashy code for using(null), my unit test might be: try { using(null) { } } catch { Assert.Fail(); } That's just about the only use I can think of.
Anonymous
March 03, 2011
"I hadn't thought of scenarios where you can write code but not debug it; I'll take your word for it that there are such scenarios. That is indeed points in favour of issuing a warning." I think it's a good idea to keep such scenarios in mind, of course. But I also think your other points about cost of implementing and maintaining such features in the compiler are important. It seems to me that, just as some warnings are better handled in external tools rather than the compiler, some debugging features are better handled in making the debugging tools better, rather than relying on a compiler warning. Fact is, having to write the code in a certain way just to make it easier to debug (e.g. the "write-only locals" post) or having the compiler warn about things that should be trivial to debug (e.g. the recursive property getter) points to a failing in the debugging tools. If it's dirt-cheap for the compiler to catch an issue, fine…include that as a feature. But I'd expect in most cases, even seemingly easy-to-implement warnings will involve a significant overhead: spec, implementation, testing, maintenance, all of this adds up. If it's harder to debug code when it's written a particular way, then the real solution is for the debugging tools to be better. Give me a way to inspect the return value a method is about to return. And for scenarios like debugging mobile apps, let's put pressure on those environments to provide better debugging tools. It's not the C#'s compiler to waste time generally, just to address semi-niche scenarios such as this. There is also of course the question of performance. One of the things I love about C# is how fast compilation is. There are no doubt lots of reasons for this, but surely every warning that's added to the analysis is going to slow compilation down by some amount. Again, if the detection of the warned-about code costs basically nothing at run-time (i.e. it falls naturally out of analysis the compiler is already doing and requires no additional work), then maybe it's worth putting in. But how many warnings are really like that? Even the recursive property getter seems likely to involve at least one special-case check for a call to a specific method (i.e. the getter itself) that would not have had to be done otherwise. So, please…just keep doing the excellent job being done so far. I find that the C# compiler currently has a very good balance of useful warnings and performance. Do listen to us customers of course, but please also continue to resist the temptation to put something into the compiler just because one of us asked for it. :)
Anonymous
March 03, 2011
@pete.d: That gives a warning because x is a bool, so there is no implicit conversion anywhere.
Anonymous
March 03, 2011
Might I +1 the recursive property warning? I've been bitten by it often enough, although it was always discovered fast enough that it caused no problem. It just feels like something the compiler should detect.
Anonymous
March 03, 2011
Why on earth do you still let users "Continue" when an application throws an unhandled exception at runtime? By definition, and Unhandled exception means that the application is in an unknown state.
Anonymous
March 03, 2011
The comment has been removed
Anonymous
March 03, 2011
You forgot the cheapest, step 0: *When you think about the code Designing a language feature that causes developers to think along successful patterns (async/await comes to mind) is probably the best way to prevent bugs from getting to the customer (along with hiring disciplined developers).
Anonymous
March 03, 2011
Hi Eric, I got blind-sided very recently with the issue that the compiler does not check/warn for duplicate values in an enum definition, which can lead to totally break a program even though unit tests are working if you do not explicitly test for duplicates. This is especially bad if it is just a forgotten value assignment. See this simple example: class Program { static void Main(string[] args) { Console.WriteLine("A:"+Test.A); Console.WriteLine("B:" + Test.B); } enum Test { A, B=0 } } Unfortunately the email function on your blog does not work....
Anonymous
March 03, 2011
@M.Weiser: Allowing duplicate values in an enum is a feature, not a bug. There are scenarios where it's useful to allow duplicate values. One such one I ran into even just recently was in declaring a enum types to represent Unicode character classes and scripts. It's nice for the enum to support all the names defined in Unicode, and in Unicode they have multiple names for many of the same values, which then needs to be mirrored in the enum in order for the enum to provide maximum utility. I don't see how allowing duplicate values would "totally break a program". Sounds like programmer error to me.
Anonymous
March 03, 2011
Many native enums (for P/Invoke) also have multiple names for the same value.
Anonymous
March 03, 2011
I agree with Jon on the self referential property overflow being non obvious in some cases. If you're using ScriptSharp, the end result runs in the browser, where a stack overflow just disappears and gives no warning. It took me quite some time to finally track the problem down, and I was a little surprised that the C# compiler hadn't warned me about it.
Anonymous
March 03, 2011
@Pete - I still think it should yield a warning if the same value is repeated. The problem for me is that it is not obvious in the case default values and automatic values are mixed accidently - if you are really sure that you want this, then you may suppress the warning. See my example program for the side effect. One would never assume that it will show Test.A twice.
Anonymous
March 04, 2011
The comment has been removed
Anonymous
March 04, 2011
One point about enum repeated values - it wouldn't be hard to write a unit test to go through all the enums in a given assembly and check that they didn't have any repeated values.
Anonymous
March 04, 2011
Add as much static code analysis as possible. It's value is well proven with the embedded C/C++ tools dating back to ~1980.
Anonymous
March 04, 2011
The comment has been removed
Anonymous
March 04, 2011
Don't want to be disrespectful, but "I was misremembering; serves me right for not running the code through the compiler" is something that a "developer who is so unprofessional as to deploy code without running it" would say :)
Anonymous
March 04, 2011
The comment has been removed
Anonymous
March 05, 2011
The comment has been removed
Anonymous
March 05, 2011
The comment has been removed
Anonymous
March 05, 2011
The comment has been removed
Anonymous
March 06, 2011
FYI, the VC++ compiler team thought that the self-recursive case was worth a warning: C4717 'function' : recursive on all control paths, function will cause runtime stack overflow
Anonymous
March 06, 2011
"VC++ compiler team thought that the self-recursive case was worth a warning:" Of course they did. That's one of the reasons I dislike the C++ compiler so much. :p And even compiling without optimizations, the C++ compiler takes forever compared to compiling similar code in C#. I can't help wonder how much of that time is spent doing analysis for those kinds of warnings. I have no first-hand knowledge of any C++ compiler internals, but I strongly doubt that it is the warnings causing a perceived performance issue. But more generally, you are comparing apples with oranges when you compare a C# compiler to a typical "native" C++ compiler for performance. They have very different tasks to perform. A C# program must generate complete metadata and IL for the entire program; a C++ compiler must generate object code for each individual source file and then link them later. Those two tasks have potentially very different performance characteristics. C# relies upon a JIT compiler doing all the heavy lifting of translating intermediate code to optimized machine code; most C++ compilers do not have the luxury of being able to skip the optimizations. C++ compilers are written to assume that each object file can be compiled independently by re-parsing the headers, and the link step that resolves inter-file dependencies happens later; C# was designed so that all the "top level" metadata is generated first and then each method body compiled can make efficient use of that in-memory metadata metadata. That these compilation models are fundamentally different seems to me to be a much more likely cause of performance differences than trivialities like what warnings are detected. -- Eric
Anonymous
March 06, 2011
"Developers who are so unprofessional as to deploy code without running it also ignore compiler warnings. -- Eric" The warning could be very useful if the organization compiles with "warnings as errors" turned on. So even though the developer is lazy and deserves it, the organization is still spared the bug.
Anonymous
March 06, 2011
Two points:

You don't have to break existing code when you introduce new warnings if you change your warning level to include some indication of the compiler version. E.G. Treat warnings as errors: 'Those introduced in VS2010 and earlier' etc
The recursive property/stack blowing mistake can be expensive to find - especially if the only symptom you have is some WCF service going down. Hours. Yet it is so cheap for you to spot. I have never understood why this does not meet the cost/benefit tradeoff.

Anonymous
March 07, 2011
"(For example, consider the machine-generated code that you get when you create a new project in Visual Studio. It had better compile without warnings!)" Then why do the templates compile with warnings if I have "XML documentation file" turned on?
Anonymous
March 07, 2011
@pete.d The difference is that C++ is at a whole different level of complexity than C# is. There is also the fact that we C++ devs can use the code is compiling as an excuse to have a nice long coffee break and the boss won't know any different.
Anonymous
March 07, 2011
@Ted: "Add as much static code analysis as possible" 100% Correct...The falacy is that the compiler is the appropriate place. @Yitzhak Steinmetz: "The warning could be very useful if the organization compiles with "warnings as errors" turned on. So even though the developer is lazy and deserves it, the organization is still spared the bug." Once again, it is not necessary to do this in the compiler. Simply use Gated-Checkins, and make sure all of the desired analysis is performed. If anything does not pass, it can notget into the Source Repository.
Anonymous
March 07, 2011
"especially if the only symptom you have is some WCF service going down." I don't understand this comment. I have found WCF's trace information to be very useful in tracking down all sorts of weird problems. A stack overflow exception, which would result in the exception and stack trace being emitted in the log (either the WCF trace output itself, or at the very least the event log), should be very easy to track down. In any case, the fact is that debugging services is difficult generally. Logging helps quite a lot, but this is yet another example of "it's not the compiler's job". We need better tools for debugging services, not ad hoc features glommed onto the compiler. "The difference is that C++ is at a whole different level of complexity than C# is. There is also the fact that we C++ devs can use the code is compiling as an excuse to have a nice long coffee break and the boss won't know any different." I can't tell if you're being facetious or not. C++ certainly is not "a whole different level of complexity" from C# and obviously your second point is no way to design compilers, so it must be tongue-in-cheek. Do you really believe there's some excuse for C++ code to take so much longer to compile than C# code?
Anonymous
March 07, 2011
"no really, as far as the compiler is concerned, it is." No, really, it's not. C# has to deal with generics, which the C++ compiler does not, assuming it's not compiling managed code. And templates are really just glorified preprocessing, no more complex than a macro. But even if templates added to the complexity in some significant way, it's not a relevant distinction if I'm not compiling code with templates in it. More to the point: regardless of the balance of performance cost of various features in the C++ compiler, the fact remains that compiling C++ code takes WAY longer than comparable C# code (not to mention that I have never crashed the C# compiler, but have done so on a regular basis with the C++ compiler). For warnings that do nothing more than detect something that would be detected the instant I actually run the code, it's a waste of time and code complexity in the compiler.
Anonymous
March 07, 2011
C++ templates are compiled for each type they are instantiated for. .NET generics are compiled once with a placeholder. There is a difference there. The only way to get C++ to take a real long time to compile is by using templates a lot, and if you use the standard C++ library then you are using templates a lot. Link time code generation also can add to the build time too, but again, it is worth it for the speed boost. But in cost VS performance, having long running codepaths complete in 290ms rather than 850ms is the kind of thing that C++ developers want. So we don't mind the extra time spent on compiling.
Anonymous
March 07, 2011
>> templates are really just glorified preprocessing, no more complex than a macro. This is misleading in general because of things like two-phase lookup (which, granted, VC++ in particular does not implement).
Anonymous
March 07, 2011
"This is misleading in general because of things like two-phase lookup" It's true, I am over-simplifying a bit. C++ templates have made significant progress since they originally appeared. But inasmuch as two-phase lookup is mainly a way to provide some validation for a template during declaration rather than instantiation, I don't feel it changes the fundamental nature of templates. Fact is, a) I don't find "templates, templates, templates" a valid excuse for why C++ compilation is so much slower than C# compilation (especially in scenarios where templates aren't even being used), and b) regardless of why the C++ compiler is so slow, I don't care for the C# compiler to be bogged down with warnings that detect things that will be found the first time the code is executed. I apologize if I've offended all the C++ apologists out there. Let's just assume for the sake of argument that the C++ compiler is blindingly fast considering all the incredibly complex and unique features it has to support. Fine by me. I still don't want the C# compiler being dragged down by warnings that aren't really improving things that much.
Anonymous
March 07, 2011
The comment has been removed
Anonymous
March 08, 2011
Doesn't an empty try..finally block stop Thread.Abort interrupting the current thread? Perhaps, someone adverse to writing try..finally blocks but wants the thread to not be interrupted could use this syntax?? :p
Anonymous
March 08, 2011
I'd like Visual Studio to warn about a simple property getter or setter that's self recursive as soon as I type it.
Anonymous
March 08, 2011
@Matt: First of all, no it doesn't. Second, why would you avoid writing try/finally blocks unless you're purposefully writing bad code?
Anonymous
March 09, 2011
@configurator I was pretty sure that as of CLR 2.0 a call from another thread to Thread.Abort would not interrupt a try finally block from executing, must go find reference document... The second part was was me being facetious :p
Anonymous
March 10, 2011
@Matt: It would interrupt the try, just not the finally.
Anonymous
March 11, 2011
@configurator: That's right! I remember now! stackoverflow.com/.../why-use-try-finally-with-an-empty-try-block

次の方法で共有

Danger, Will Robinson!

Comments

その他のリソース