Breaking Change in Linq Queries Using Explicitly-Typed Range Variables
There's a change coming in .NET Framework 3.5 Service Pack 1 that will affect some programs containing queries that explicitly specify the type of the range variable. The affected queries are those whose range variable type differs from the element type of the sequence being queried and the element type cannot be converted to the range variable type via reference conversion or boxing/unboxing conversion. Whew, that was a bunch of spec-speak. To help understand, consider the following query.
var floats = new ArrayList { 2.5f, 3.5f, 4.5f };
var ints = from int i in floats
select i;
Iterating over this query yields some surprising results, {2, 4, 4} . Why not {2, 3, 4} as one might expect? To see why this happens, let's start with the compiler's translation of this query into a series of method calls. The above query expression is rewritten into the following.
var ints = floats.Cast<int>().Select<int,int>(i => i);
Follow the flow of type information through this query. The source sequence "floats" is an ArrayList implementing IEnumerable. Cast<int>() takes this sequence as IEnumerable and returns a sequence implementing IEnumerable<int> . Select<int,int>() acts upon that sequence and returns another sequence of IEnumerable<int> . Now look at the signature of Cast<T> .
public static IEnumerable<T> Cast<T>(this IEnumerable source)
This method's purpose is to convert a non-generic IEnumerable sequence of some type T (or boxed T as the case may be) to IEnumerable<T> for use as an argument to the subsequent sequence operators which must know the compile-time type of the sequence elements. It sounds simple enough, and it should be, but due to a late-game foul up in development, it's not.
The body of Cast<T> should effectively have these semantics: roll through the sequence converting each element to the target type T, iterator style. Something like this.
foreach (object obj in sourceSequence) yield return (T)obj;
Now, looking back at the original query, if Cast<T> were implemented with these semantics, a runtime exception would occur at (T)obj, the cast from boxed float to int. Can't do that. You have to convert from boxed float to float. Then you can convert to int.
But this isn't the shipping semantics of Cast<T> , and "magically" you can convert the sequence of boxed floats to a sequence of ints, you just get, uh, Banker's rounding as opposed to truncation. Banker's rounding (round to even) is not the C# user's expected behavior when converting float to int. I'm not sure it's anyone's expected semantics, but, sadly, it is what we shipped.
The fix
In .NET Framework 3.5 Service Pack 1 (SP1) we are going to return Cast<T> to its intended semantics described above. Not only is the current behavior not intuitive, it's slow as Christmas. But fixing this is obviously a breaking change. Once you get SP1 you may find that queries which once worked now throw exceptions. That's not great, but it's something that can easily be dealt with by developers - change the type of the range variable to match the collection element type and then, as necessary, add casts where the range variable is used.
But one important thing to understand about this change is that the breaking change is in the .NET Framework libraries, not the compiler. Cast<T> is a framework method. This means that if your application contains a problematic query and has been distributed to users, it will begin to throw when your user gets SP1.
Avoid the problem altogether by omitting the range variable type
The call to Cast<T> in the above query expression was introduced by the rewriter in response to the presence of an explicitly-typed range variable. That's how the syntactic rewrite rules are specified. But if you omit the "int" in "from int i" no call to Cast<T> is generated.
Specification of the type of the range variable is optional in the query syntax, but if you're using a collection that only implements IEnumerable, you've got to specify it. On the other hand, when using a collection that implements IEnumerable<T> you can, and should, omit the range variable type. Not only does it avoid this entire can of worms, but it has the performance benefit of omitting an unneeded iterator in the chain of iterators mentioned before.
Comments
Anonymous
February 15, 2008
The comment has been removedAnonymous
February 16, 2008
MSDN博客中的一篇文章提到了.NET3.5SP1会带来的处个修正,见以下代码: varfloats=newArrayList{2.5f,3.5f,4.5f}; ...Anonymous
February 18, 2008
Thankfully this won't affect me, but.... I was wondering where Ican find info on the .NET 3.5 Service Pack 1. Google hasn't been my friend... When is it due to ship?Anonymous
February 19, 2008
Ouch! The original behavior was bad, but "it might behave the bad way, or another way, depending on what version of the CLR it's running on" is much worse. Developers could finding themselves fighting the "how does it behave when modulus operations are buggy" battle that has done so much damage to the Java platform. This change should have been left until version 4.Anonymous
February 25, 2008
alunharford: Given that the age of any (legal) code is still fairly young, I wouldn't expect this to be painful to correct to conform to the suggestion of not explicitly stating the range type. Indeed, for the large majority of LINQ users, I doubt there's been a need to state the range type at all. Given the performance benefit of the fix and what I figure will be relatively few actual breakages, I think it's appropriate to do this change now. Heck, even those which would break have plenty of time to code around the breaking area. They could conceivably write their own cast operator if they wanted. Waiting until version 4 would have given an established base of code that would be much larger and more complex. You would have to create a Cast2<T> method, which would just be wrong.Anonymous
March 11, 2008
Welcome to the forty-first Community Convergence. The big news this week is that we have moved FutureAnonymous
March 14, 2008
Slightly strange question, I know - but why is the call to Select(i => i) included in the first place? Shouldn't it be removed as a degenerate query expression, given that there's a call to Cast?Anonymous
June 02, 2008
We've just released a new community technology preview (CTP) of Parallel Extensions to the .NET Framework!Anonymous
August 11, 2008
There are three perf improvements in the soon to be released SP1 . As always, I will let you run yourAnonymous
August 11, 2008
Dinesh Kulkarni wrote an important post about changes in LINQ introduced by .NET 3.5 SP1 that has beenAnonymous
August 11, 2008
Dinesh Kulkarni wrote an important post about changes in LINQ introduced by .NET 3.5 SP1 that has beenAnonymous
August 12, 2008
I found some interesting blog posts about breaking changes in the SP1 for Visual Studio 2008 and the