null is not false, part two
In Raymond Smullyan's delightful books about the Island of Knights and Knaves -- where, you'll recall, knights make only true statements and knaves make only false statements -- the knights and knaves are of course clever literary devices to explore problems in deductive (*) logic. Smullyan, to my recollection, never explores what happens when knights and knaves make statements which are disingenuous half-truths, authorial license in pursuit of a larger truth, or other forms of truthiness. A nullable Boolean in C# gives us, if not quite the notion of truthiness, at least the notion that true and false are not the only possible values of a predicate: there is also "null", whatever that means.
What does that mean? A null Boolean can mean "there is a truth state, but I just don't know what it is": for example, if you queried a database on December 1st to ask "were the sales figures for November higher than they were in October?" the answer is either true or false, but the database might not know the answer because not all the figures are in yet. The right answer in that case would be to say "null", meaning "there is an answer but I do not know what it is."
Or, a null Boolean can mean "the question has no answer at all, not even true or false". True or false: the present king of France is bald. The number of currently existing kings of France -- zero -- is equal to the number of currently existing bald kings of France, but it seems off-putting to say that a statement is "vacuously true" in this manner when we could more sensibly deny the validity of the question. There are certainly analogous situations in computer programming where we want to express the notion that the query is so malformed as to not have a truth value at all, and "null" seems like a sensible value in those cases.
Because null can mean "I don't know", almost every "lifted to nullable" operator in C# results in null if any operand is null. The sum of 123 and null is null because of course the answer to the question "what is the sum of 123 and something I don't know" is "I don't know!" The notable exceptions to this rule are equality, which says that two null values are equal, and the logical "and" and "or" operators, which have some very interesting behaviour. When you say x & y for nullable Booleans, the rule is not "if either is null then the result is null". Rather, the rule is "if either is false then the result is false, otherwise, if either is null then the result is null, otherwise, the result is true". And similarly for x | y -- the rule is "if either is true then the result is true, otherwise if either is null then the result is null, otherwise the result is false". These rules obey our intuition about what "and" and "or" mean logically provided that "null" means "I don't know". That is the truth value of "(something true) or (something I don't know)" is clearly true regardless of whether the thing you don't know is true or false. But if "null" means "the question has no answer at all" then the truth value of "(something true) or (something that makes no sense)" probably should be "something that makes no sense".
Things get weirder though when you start to consider the "short circuiting" operators, && and ||. As you probably know, the && and || operators on Booleans are just like the & and | operators, except that the && operator does not even bother to evaluate the right hand side if the left hand side is false, and the || operator does not evaluate the right hand side if the left hand side is true. After we've evaluated the left hand side of either operator, we *might* have enough information to know the final answer. We can therefore (1) save the expense of computing the other side, and (2) allow the evaluation of the right hand side to depend on a precondition established by the truth or falsity of the left hand side. The most common example of (2) is of course if (s == null || s.Length == 0) because the right hand side would have crashed and burned if evaluated when the left hand side is true.
The && and || operators are not "lifted to nullable" because doing so is problematic. The whole point of the short-circuiting operator is to avoid evaluating the right hand side, but we cannot do so and still match the behaviour of the unlifted version! Suppose we have x && y for two nullable Boolean expressions. Let's break down all the cases:
- x is false: We do not evaluate y, and the result is false.
- x is true: We do evaluate y, and the result is the value of y
- x is null: Now what do we do? We have two choices:
- We evaluate y, violating the nice property that y is only evaluated if x is true. The result is false if y is false, null otherwise.
- We do not evaluate y. The result must be either false or null.
- If the result is false even though y would have evaluated to null, then we have resulted in false incorrectly.
- If the result is null even though y would have evaluated to false, then we have resulted in null incorrectly.
In short, either we sometimes evaluate y when we shouldn't, or we sometimes return a value that does not match the value that x & y would have produced. The way out of this dilemma is to cut the feature entirely.
I said last time that I'd talk about the role of operator true and operator false in C#, but I think I shall leave that to the next episode.
(*) Smullyan's book of combinatory logic puzzles, To Mock A Mockingbird, is equally delightful and I recommend it for anyone who wants a playful introduction to the subject.
Comments
Anonymous
April 12, 2012
"We evaluate y, violating the nice property that y is only evaluated if x is true. The result is false if y is false, null otherwise. " How about this: If x is not false, y is evaluated. If you look at it this way, this choice is not a violation. Apparently, that's what Section 7.12 of the C# Language Specification, version 4.0 (Conditional Logical Operators) says: • The operation x && y corresponds to the operation x & y, except that y is evaluated only if x is not false.Anonymous
April 12, 2012
"... the right hand side would have crashed and burned if evaluated when the left hand side is false" Or "true" as it's known! :o) (If "s == null" is false, "s.Length == 0" won't crash and burn.)Anonymous
April 12, 2012
I love that Raymond Smullyan book. Here's the hardest puzzle from it, for those looking for a stumper: You're once again at a fork in the road, and again, one path leads to safety, the other to doom. There are three natives at the fork. One is from a village of truth-tellers, one from a village of liars, one from a village of random answerers. Of course you don't know which is which. Moreover, the natives answer "pish" and "posh" for yes and no, but you don't know which means "yes" and which means "no." You're allowed to ask only two yes-or-no questions, each question being directed at one native. What do you ask?Anonymous
April 12, 2012
PKI certificate checks have the concept of "null" in a way, it's "Unknown" response rather than "Good" (true) or "Revoked" (false). Unknown can then be a very difficult thing to handle appropriately in all cases. You have to understand why you have an Unknown value-- did you fail to connect? Did the server have no knowledge of the certificate? Etc.Anonymous
April 12, 2012
Oohh logic puzzles I love those - just ordered the book thanks to your recommendation! Slightly OT: One of my favorites: www.xkcd.com/blue_eyes.html (I hope that counts as an exception to the informal rule to never link to xkcd? ;) )Anonymous
April 12, 2012
Nice story! I can't say there was anything wrong with it.Anonymous
April 12, 2012
I personally think C# got this one wrong, and SQL has it right: (null && false) == (false && null) == false; (null || true) == (true || null) == true; (null || false) == (false || null) == null; (null && true) == (true && null) == true; and so on. The logical semantics are very clear. Short-circuiting semantics are also very clear - if the first value is sufficient to determine the truth or falsity of the larger expression, then you don't need to evaluate the remainder of the expression. It's just that there are more possible values to consider, that's all. It annoyed us at a previous company I worked for, so much, that we wrote our own embedded DSL in order to have SQL null semantics on booleans (as well as strings and other reference types).Anonymous
April 12, 2012
Great article Eric. However -> if (s != null || s.Length == 0)Anonymous
April 12, 2012
@Barry.Kellly - SQL does not have not have the && operator. SQL has the AND operator which does not guarantee "short-circuiting semantics". The AND operator in SQL is closer to the & operator for bool? in C#. The behavior of & on bool? in C# matches the SQL AND operator behavior that you specified.Anonymous
April 13, 2012
Thanks for this bit of reasoning on bool?. And for the book recommendation. I just ordered a copy.Anonymous
April 13, 2012
The answer to most of these should be that boolean is the wrong choice. If you believe in strongly-typed languages, then you should at least pick decent types - and boolean-plus-ok-maybe-it-might-be-null isn't ever one of them. You need a new set of types that indicates the real responses, not this null crazyness.Anonymous
April 14, 2012
@James Moore I think this blog post is about language design and compiler implementation to help people use the language better. Your point, which is approximately "we need a bool with five or six enumerations" is valid in a different context. But then, the language provides enumerations, one can simply make a new class for that. Yet, we need nullable basic types for many different reasons, and bool is a basic type. So your statement, "boolean is the wrong choice" strikes me as malappropriate for this context. Thanks Eric, for the great inside information, the likes of which I personally have never seen for any other language. Since I am still on the C# learning curve, these kinds of blog posts help me tremendously.
- Paul in Raleigh
Anonymous
April 15, 2012
I prefer this one: http://xkcd.com/246/Anonymous
April 15, 2012
I don't think short-circuiting operators present a special problem for this. The obvious definition of "short-circuiting operator" as "the right-hand side is evaluated if and only if it is necessary to determine the answer". Which reduces the problem to how the normal and/or operators are defined. The way those operators are defined in Visual Basic happens to require the RHS in both cases: null | true is true, and null & false is false. If they were defined as ordinary "lifting" operators, the result would always be null so it would need to be evaluated in neither case. "the nice property that y is only evaluated if x is true" is a matter of overgeneralizing from a property of the contract that only applies to non-nullable values.Anonymous
April 26, 2012
The comment has been removedAnonymous
April 30, 2012
The health information documents standard CDA also has the notion of an null value, but takes it several steps further. Associated with null values are the "flavor" or reason that it wasn't put into the document: "asked the patient, who didn't know", "didn't ask the patient", "there's information, but I can't tell you because of patient privacy", "the actual value isn't one of the allowed values for this field", "no value is applicable", etc. If you think tri-value logic is fun, consider a system where you have to decide if "couldn't encode the actual value" is the same as "the actual value is too small to encode".