Improvements to the Generated SQL in .NET 4.0 Beta1
The information in this post is out of date.
Visit msdn.com/data/ef for the latest information on current and past releases of EF.
One of the things that we are continuously working on improving is the quality and the readability of the SQL generated when querying using the Entity Framework. We have already made some improvements in .NET 4.0 Beta 1, and we are working on more for .NET 4.0 post Beta 1.
Most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes. Because the changes were in the query pipeline, this results in simpler output command trees and thus would affect the SQL generated by any backend, not only by our SQL Server provider.
Below we’ve highlighted the biggest improvements available in Beta 1 and then the ones that will be in the next public release of .NET Framework 4.0. Unless otherwise noted, the examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (https://code.msdn.microsoft.com/EFQuerySamples). Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.
1. Avoid projecting all columns in nested queries
In some nested queries in the generated SQL, we used to explicitly project all columns that are brought in scope by the corresponding FROM clause, like the fragments marked red in the sample generated SQL below. Now, we instead only project the columns that are later referenced. This improvement is specific to our T-Sql Generation in SqlClient.
Example:
var
query = (
from
p in context.Products
orderby
p.ProductName
select
p.ProductName)
.Skip(2).Take(2);
2. Avoiding Unnecessary IS NULL Checks in LINQ to Entities Queries
When translating joins in LINQ to Entities, we check whether the columns on which the join is specified are equal or both are null. In the cases when one (or both) of the columns is non-nullable, the “IS NULL” check is redundant and has been removed. This improvement is in the Entity Framework query pipeline.
Example:
var
query = context.Products.Join(
context.Categories,
p => p.ProductID,
c => c.CategoryID,
(p, c) =>
new
{ p.ProductID, c.CategoryID, p.ProductName, c.CategoryName });
3. Additional join elimination: Eliminate parent table in “Child Left Outer Join Parent”
In scenarios where a given table is explicitly or implicitly (e.g. via a navigation property) joined to a parent table (i.e. table to which it has a foreign key constraint) we were not eliminating the parent table even if the only columns referenced from that table were the columns comprising the primary key and could have been reused from the child table. (Note: The parent-child terminology assumes that there is foreign key relationship between these tables specified in the SSDL). This improvement is in the Entity Framework query pipeline.
Example:
Note: For this example the highlighted following fragments need to be added to the SSDL in the ADO.NET Entity Framework Query Samples noted above:
string entitySQL = "SELECT p.Category.CategoryId FROM Products as p";
For LINQ to Entities in .NET Framework 3.5 SP1 the equivalent query resulted in the same generated query for Entity SQL shown in the After sample.
4. Using IN instead of nested OR’s
Comparing a column to multiple values, either as a result of the explicit use of Entity SQL’s IN expression or as result of internally generated checks over a type discriminator value used to result in nested OR expressions in the generated SQL. Now, we are instead producing an IN expression. This improvement is specific to our T-Sql Generation in SqlClient.
For example:
string entitySQL = "SELECT p FROM Products as p where p.ProductId in {1, 2, 3, 4, 5}";
Starting with.NET 4.0 Beta 1 Entity Framework, a LINQ version of this query is also supported:
var
query = context.Products.Where(p =>
new int
[] { 1, 2, 3, 4, 5 }.Contains(p.ProductID));
The resulting generated TSQL also takes advantage of this improvement:
5. Translating more of LINQ’s GroupBy operator into GROUP BY
LINQ’s GroupBy operator is richer then SQL’s group by clause. For example, in addition to aggregates over a group it can return the entire group. When translating LINQ’s GroupBy, we try to recognize if it can be expressed by SQL’s group by (i.e. DbGroupByExpression without a group partition), otherwise we translate it into a more complex expression involving a join to the input table.
In Entity Framework 4.0 Beta1 we have expanded the cases that we are able to translate into SQL’s GroupBy. This improvement is in the Entity Framework query pipeline.
Example:
var
query = context.Products.Where(p => p.ProductID < 4)
.GroupBy(p => p.ProductID, p => p.ProductID, (key, group) =>
new { Key = key, Max = group.Max() });
6. Avoiding “Cast (1 as bit)”, using “1” instead
In cases when we need an internally generated constant, like when translating EntitySQL Exists expression or IEnumerable.Count in LINQ to Entities, we previously used “true”, which translates to “cast(1 as bit)” on SQL Server. We now instead use the integer constant 1, which translates to “1”. This improvement is in the Entity Framework query pipeline.
var
query = context.Categories.Select(c => new { c.CategoryID, count = c.Products.Count() });
7. Simplifying some queries that go against a subtype in a type hierarchy
In some cases when a query is only interested in entities of a particular subtype (specified via OfType or IsOf for example) we used to query using a view over the base type and thus possibly generate some unnecessary case statements to also check for the types that are not of the desired subtype. This improvement allows us to use the simplified query view that is only over the desired subtype and thereby generate a simpler provider query.
The following example that illustrates that is over a schema with a TPH mapping with the inheritance hierarchy as shown in the figure. This improvement is in the Entity Framework query pipeline.
var
query = this.tph.Entities.OfType<Tph.
Derived1
>().GroupBy(d2 => d2).Select(p => p.Max(g => g.Id));
Improvements in .NET 4.0 post Beta1
Here is a quick overview of the improvements that we are working on for this release but are not included in Beta1:
- Elimination of null sentinels – In the output query there is often the constant 1 projected both in nested and in top-most queries. By avoiding it and reusing other columns for the same purpose in many cases we are able to avoid levels of nesting.
- Additional Join Elimination
- Use of Inner Joins instead of Left Outer Joins when possible
- Provide mechanism for efficient queries on non-Unicode columns – We now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns. This allows indexes to be used by SQL Server.
- Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE.
- Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE (these canonical functions became available in .NET 4.0 Beta1)
- Collapse multiple Case statements into one
Kati Iceva
Software Development Engineer, Entity Framework
Comments
Anonymous
August 05, 2009
Post Beta 1: "Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE. " In the current release (3.5) we use the ado.net data services to access the EF and one mayor problem is the performance of the above mentioned improvment. Please... implement this feature in the final release of EF4. Thanx MarcoAnonymous
August 05, 2009
Thank you so much Kati, This is just in time, I was about to start working on it to make this comparison, you just saved my time. And good job EF team. Good luck with the restAnonymous
August 07, 2009
Hi, Will DefaultIfEmpty() operator be supported in EF v2 (LINQ outter joins)? Thanks, GaborAnonymous
August 07, 2009
Hi Gabor, Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0. Thanks, KatiAnonymous
August 07, 2009
Hi Gabor, Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0. Thanks, KatiAnonymous
August 10, 2009
Asside from not being able to do this Product.ID == (Guid) SomeObject in a where clause causing all kinds of variable declarations in our code all of the time for no particular reason, the most frustrating thing is doing Left Outer Joins. These are a major PITA in linq. PLEASE PLEASE PLEASE add an outerjoin command that simplifies this so that I can do simple left outer joins. The only other thing is that I have all kinds of cases where I want to return a property that isn't part of the database in the select. This should be supported, because it should know to select on the database the ones that are applicable to the database, then pull the proper properties for the rest from the object. Linq to SQL seems to do a pretty good job of this, but Linq to Entities is horrible. Solving these issues and your support for functions will make .NET 4 fantastic.... well that and PICO and non diagram based implimentations so that people can inherit from our objects and add fields etc....Anonymous
August 11, 2009
It's great to see improvements in the SQL generated by the EF. One thing that I think could be a small improvement is to use shorter generated names for the tables/extents/projections. Linq2Sql uses [t0], [t1], etc and the shorter names IMHO make for less "noise" in the SQL and a more readable query. Plus every byte counts when sending SQL statements over the wire, especially when going from a local client to a remote database over an encrypted connection. Just something to consider. Also, +1 on what James said.Anonymous
August 13, 2009
Coming from a biztalk environment, will For XML still be supported?Anonymous
August 17, 2009
Thanks for your comments. James, Entity Framework 4.0 will support DefaultIfEmpty, whic is used for achieving a LeftOuterJoin in Linq. Shawn, we will consider your suggestion. Thanks, KatiAnonymous
August 18, 2009
Absolutely agree with Shawn about the shorter alias names. Shorter names are cleaner, easier to read, cut bandwidth costs, and I'd imagine it would also mean slightly faster query parsing on the server side too.Anonymous
August 27, 2009
The comment has been removedAnonymous
September 03, 2009
Great! Good to see you are investing a lot in were we put our future in :) "Use of Inner Joins instead of Left Outer Joins when possible" I hope this means: use inner joins when relation is 1 and use outer join when relation is 0..1. In this version the outer join does not work correctly in hierarchy classes f.e. order 0..1 to person where person is an extension of user becomes an inner join ;( It would be preferrable if the developer could force it with a keyword. Also an isnull on required fields in an outer join is really needed because ?? does not work.Anonymous
September 26, 2009
- It would be very helpfull if u make a menu command to refresh all tables .xml files by getting the last changes from the database instead of us going to each .xml file and refreshing each table one by one. at least for the tables that we submitted to get all the columns like: Select * from table1.
- Also it would be very good if there is a command that will check the tables in all the files in the Visual Studio solution and their real ones in SQL Server.
Anonymous
October 03, 2009
Hello, This simple query against AdventureWorks database using EF4.0 Beta 1 seems to produce some pretty horrible SQL – superfluous aliases everywhere, a redundant FROM clause, the same condition being tested twice etc. LINQ TO EF: bool exists = context.Customers.Any(c => c.AccountNumber == "AW00000003"); GENERATED SQL: SELECT CASE WHEN ( EXISTS (SELECT 1 AS [C1] FROM [Sales].[Customer] AS [Extent1] WHERE N'AW00000003' = [Extent1].[AccountNumber] )) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 1 AS [C1] FROM [Sales].[Customer] AS [Extent2] WHERE N'AW00000003' = [Extent2].[AccountNumber] )) THEN cast(0 as bit) END AS [C1] FROM ( SELECT 1 AS X ) AS [SingleRowTable1] EXPECTED: SELECT CASE WHEN ( EXISTS (SELECT 1 FROM [Sales].[Customer] AS [Extent1] WHERE N'AW00000003' = [Extent1].[AccountNumber] )) THEN 1 ELSE 0 END AS [C1]Anonymous
October 03, 2009
For TPT Inheritance, how about the ability to specify a discriminator column in the base entity to help relate to derived entities? Example: A Base entity of Vehicle would have a discriminator column called VehicleType. Then the derived entity, Car, would be conditionally related to Vehicle when Vehicle.VehicleType = [some constant]. Seems like you could make the SQL more efficient if the join to the derived table uses the VehicleType in the condition instead of just an outer join based on the pk/fk. It would save the DB from even scanning the derived table for the PK unless it's VehicleType matched first. Don't a lot of databases have that discriminator column anyways in this type of situation?Anonymous
October 23, 2009
It would nice if the Linq method Contains would be included and translated to the sql IN statement: EntitySet.Where(e=>{1,2,3}.Contains(e.Id)); converted to sql: select ... where Id IN (1,2,3)Anonymous
October 28, 2009
Will you guys deveop a full text -- Contains, freetext, etc feature in Linq to Entities? If not when will this feature be available? I still find myself using inline sql for full text.Anonymous
December 22, 2009
Aquilax. in EF 4 you can use contains as your example shows.Anonymous
December 22, 2009
It would be nice if you could specifiy IEnumerables as parameters to a compiled query i..e var GetFooBar = CompiledQuery.Compile<testEntities, int, int[], IQueryable<int?>>((ctx, bar, list) => from f in ctx.foos where f.bar == bar && list.Contains(f.bar.Value) select f.bar); When you try and execute the query it complains because int[] is not a simple type.Anonymous
October 15, 2010
Well maybe .net 4.0 is out but for future versions, it would be very helpful if when retrieving a table field value from the generated classes and that field is a null that there is a property at the dataset to suppress the exception from raising. so we can use them anywhere instead if us always handling the the exception. FYI, in each table most of the times that most fields are not mandatory so they can have a null as a normal value.Anonymous
October 25, 2010
The comment has been removed