.NET Framework 3.5 SP1: LINQ perf improvements (LINQ to Objects and LINQ to SQL)
There are three perf improvements in the just released SP1. As always, I will let you run your own microbenchmarks or more meaningful app-level benchmarks.
LINQ to Objects:
Specialized enumerable: The new implementation recognizes queries that apply Where and/or Select to arrays or List<T>s and fold pipelines of multiple enumerable objects into single specialized enumerables. This produces substantial improvement in base overhead of common LINQ to Objects queries (at times 30+%).
Cast<T> breaking change: This is a bug fix and a breaking change (see this post for background). The intended use of the NET FX 3.5 Cast<T> extension method is querying over non-generic collection types, whose elements require either a reference conversion or an unboxing step to be used in a generic query context. A late change VS 2008 cycle allowed the cast to succeed in more situations than intended, such as converting float values to int, where it should instead be throwing an InvalidCastException. The breaking change reverts the beta2 behavior and improves perf by simplifying the implementation of CastIterator<T>. Value conversions and explicitly-defined user conversions cause an InvalidCastException instead of being allowed (as in RTM).
var stringList = new ArrayList { "foo", "bar" };
var intList = new ArrayList { 3, 4, 5 };
var strings = from string s in stringList
select s;
var ints = from int i in intList
select i;
The above queries compile to
var strings = stringList.Cast<string>();
var ints = intList.Cast<int>();
You can imagine a simplified implementation
static IEnumerable<T> CastIterator<T>(IEnumerable source)
{
foreach (object obj in source) yield return (T)obj;
}
LINQ to SQL:
This too is a bug fix. The original intent was to optimize id-based queries that are expected to return singletons. If an entity with a matching key value is already in the DataContext identity cache, then translating the query to SQL and executing it against the database is a pure waste of time since the retrieved row is promptly thrown away to avoid stomping on user's existing object. Now that bug has been fixed. So an id-based query will not cause a trip to the database. This results in a dramatic perf improvement (one hash table lookup instead of SQL translation + SQL query execution) in an admittedly narrow but common scenario.
BTW, as mentioned in a previous post, I haven't worked on either component for SP1. But I have been deeply involved in them for 3.5 RTM so I can't resist tracking such sweet changes. Besides, I am working on a LINQ book that keeps me very involved with the components.
Dinesh
Comments
Anonymous
August 11, 2008
Dinesh Kulkarni wrote an important post about changes in LINQ introduced by .NET 3.5 SP1 that has beenAnonymous
August 11, 2008
Dinesh Kulkarni wrote an important post about changes in LINQ introduced by .NET 3.5 SP1 that has beenAnonymous
August 11, 2008
In May, I announced the changes related to LINQ that were included in .NET 3.5 Service Pack 1 Beta. NowAnonymous
August 11, 2008
In May, I announced the changes related to LINQ that were included in .NET 3.5 Service Pack 1 Beta. NowAnonymous
August 18, 2008
Hi Dinesh, "So an id-based query will not cause a trip to the database." Excellent! :-) Now the next thing in this vein is to allow me to ask for objects by ID and get them without hitting the database /even when the object is not in the cache/! For example, I want to create a new Product and set its Category to point to the category with ID 25. To do that I want to ask for the Category object with ID 25 and to get it (without asking the DB) regardless of whether it is in the cache or not. The result should be the same as setting a primitive CategoryID property on the Product to 25. At the end of the operation, only one SQL statement should have been sent to the database: the Insert statement. /MatsAnonymous
August 21, 2008
When we wrote LINQ in Action, we took a bit of time to explain how the identity tracking system workedAnonymous
September 02, 2008
Mats, First, my apologies on behalf of the new spam filter. I don't have moderation turned on so I am less punctual about monitoring comments. For some reason that I can't figure out, your comment was flagged as spam and remained unpublished. As for your suggestion about not retrieving Category, why do you need the "fake" Category object anyway? You can just set Product.CategoryID to 25 if you like. What value does a new'ed up Category provide? Thanks, Dinesh