Improvements to the Generated SQL in .NET 4.0 Beta1

 


The information in this post is out of date.

Visit msdn.com/data/ef for the latest information on current and past releases of EF.


 

One of the things that we are continuously working on improving is the quality and the readability of the SQL generated when querying using the Entity Framework. We have already made some improvements in .NET 4.0 Beta 1, and we are working on more for .NET 4.0 post Beta 1.

Most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes. Because the changes were in the query pipeline, this results in simpler output command trees and thus would affect the SQL generated by any backend, not only by our SQL Server provider.

Below we’ve highlighted the biggest improvements available in Beta 1 and then the ones that will be in the next public release of .NET Framework 4.0. Unless otherwise noted, the examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (https://code.msdn.microsoft.com/EFQuerySamples). Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.

1. Avoid projecting all columns in nested queries

In some nested queries in the generated SQL, we used to explicitly project all columns that are brought in scope by the corresponding FROM clause, like the fragments marked red in the sample generated SQL below. Now, we instead only project the columns that are later referenced. This improvement is specific to our T-Sql Generation in SqlClient.

Example:

var

  query = (

from

  p in context.Products
                 

orderby

  p.ProductName
                 

select

  p.ProductName)
                 .Skip(2).Take(2);

blog1

2. Avoiding Unnecessary IS NULL Checks in LINQ to Entities Queries

When translating joins in LINQ to Entities, we check whether the columns on which the join is specified are equal or both are null. In the cases when one (or both) of the columns is non-nullable, the “IS NULL” check is redundant and has been removed. This improvement is in the Entity Framework query pipeline.

Example:

var

  query = context.Products.Join(
context.Categories, 
p => p.ProductID,
c => c.CategoryID,
(p, c) => 

new

 { p.ProductID, c.CategoryID, p.ProductName, c.CategoryName });

blog1

3. Additional join elimination:  Eliminate parent table in “Child Left Outer Join Parent”

In scenarios where a given table is explicitly or implicitly (e.g. via a navigation property) joined to a parent table (i.e. table to which it has a foreign key constraint) we were not eliminating the parent table even if the only columns referenced from that table were the columns comprising the primary key and could have been reused from the child table. (Note: The parent-child terminology assumes that there is foreign key relationship between these tables specified in the SSDL). This improvement is in the Entity Framework query pipeline.

Example:

Note: For this example the highlighted following fragments need to be added to the SSDL in the ADO.NET Entity Framework Query Samples noted above:

blog1

 string entitySQL = "SELECT p.Category.CategoryId FROM Products as p";

blog1

For LINQ to Entities in .NET Framework 3.5 SP1 the equivalent query resulted in the same generated query for Entity SQL shown in the After sample.

4. Using IN instead of nested OR’s

Comparing a column to multiple values, either as a result of the explicit use of Entity SQL’s IN expression or as result of internally generated checks over a type discriminator value used to result in nested OR expressions in the generated SQL. Now, we are instead producing an IN expression. This improvement is specific to our T-Sql Generation in SqlClient.

For example:

 string entitySQL = "SELECT p FROM Products as p where p.ProductId in {1, 2, 3, 4, 5}";

blog1

Starting with.NET 4.0 Beta 1 Entity Framework, a LINQ version of this query is also supported:

 

var

  query = context.Products.Where(p => 

new int

 [] { 1, 2, 3, 4, 5 }.Contains(p.ProductID));    

The resulting generated TSQL also takes advantage of this improvement:

blog1

5. Translating more of LINQ’s GroupBy operator into GROUP BY

LINQ’s GroupBy operator is richer then SQL’s group by clause. For example, in addition to aggregates over a group it can return the entire group. When translating LINQ’s GroupBy, we try to recognize if it can be expressed by SQL’s group by (i.e. DbGroupByExpression without a group partition), otherwise we translate it into a more complex expression involving a join to the input table.

In Entity Framework 4.0 Beta1 we have expanded the cases that we are able to translate into SQL’s GroupBy. This improvement is in the Entity Framework query pipeline.

Example:

var

  query = context.Products.Where(p => p.ProductID < 4)
            .GroupBy(p => p.ProductID, p => p.ProductID, (key, group) => 
            new { Key = key, Max = group.Max() });

blog1

6. Avoiding “Cast (1 as bit)”, using “1” instead

 

In cases when we need an internally generated constant, like when translating EntitySQL Exists expression or IEnumerable.Count in LINQ to Entities, we previously used “true”, which translates to “cast(1 as bit)” on SQL Server. We now instead use the integer constant 1, which translates to “1”. This improvement is in the Entity Framework query pipeline.

var

  query = context.Categories.Select(c => new { c.CategoryID, count = c.Products.Count() });

blog1

7. Simplifying some queries that go against a subtype in a type hierarchy

In some cases when a query is only interested in entities of a particular subtype (specified via OfType or IsOf for example) we used to query using a view over the base type and thus possibly generate some unnecessary case statements to also check for the types that are not of the desired subtype. This improvement allows us to use the simplified query view that is only over the desired subtype and thereby generate a simpler provider query.

The following example that illustrates that is over a schema with a TPH mapping with the inheritance hierarchy as shown in the figure. This improvement is in the Entity Framework query pipeline.

 

 

image

var

  query = this.tph.Entities.OfType<Tph.

Derived1

 >().GroupBy(d2 => d2).Select(p => p.Max(g => g.Id));

blog1

 

blog1

Improvements in .NET 4.0 post Beta1

Here is a quick overview of the improvements that we are working on for this release but are not included in Beta1:

  • Elimination of null sentinels – In the output query there is often the constant 1 projected both in nested and in top-most queries. By avoiding it and reusing other columns for the same purpose in many cases we are able to avoid levels of nesting.
  • Additional Join Elimination
  • Use of Inner Joins instead of Left Outer Joins when possible
  • Provide mechanism for efficient queries on non-Unicode columns – We now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns. This allows indexes to be used by SQL Server.
  • Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE.
  • Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE (these canonical functions became available in .NET 4.0 Beta1)
  • Collapse multiple Case statements into one

 Kati Iceva
Software Development Engineer, Entity Framework

Comments

  • Anonymous
    August 05, 2009
    Post Beta 1: "Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE. " In the current release (3.5) we use the ado.net data services to access the EF and one mayor problem is the performance of the above mentioned improvment. Please... implement this feature in the final release of EF4. Thanx Marco

  • Anonymous
    August 05, 2009
    Thank you so much Kati, This is just in time, I was about to start working on it to make this comparison, you just saved my time. And good job EF team. Good luck with the rest

  • Anonymous
    August 07, 2009
    Hi, Will DefaultIfEmpty() operator be supported in EF v2 (LINQ outter joins)? Thanks, Gabor

  • Anonymous
    August 07, 2009
    Hi Gabor, Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0. Thanks, Kati

  • Anonymous
    August 07, 2009
    Hi Gabor, Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0. Thanks, Kati

  • Anonymous
    August 10, 2009
    Asside from not being able to do this Product.ID == (Guid) SomeObject in a where clause causing all kinds of variable declarations in our code all of the time for no particular reason, the most frustrating thing is doing Left Outer Joins. These are a major PITA in linq. PLEASE PLEASE PLEASE add an outerjoin command that simplifies this so that I can do simple left outer joins. The only other thing is that I have all kinds of cases where I want to return a property that isn't part of the database in the select.  This should be supported, because it should know to select on the database the ones that are applicable to the database, then pull the proper properties for the rest from the object. Linq to SQL seems to do a pretty good job of this, but Linq to Entities is horrible. Solving these issues and your support for functions will make .NET 4 fantastic.... well that and PICO and non diagram based implimentations so that people can inherit from our objects and add fields etc....

  • Anonymous
    August 11, 2009
    It's great to see improvements in the SQL generated by the EF.  One thing that I think could be a small improvement is to use shorter generated names for the tables/extents/projections.  Linq2Sql uses [t0], [t1], etc and the shorter names IMHO make for less "noise" in the SQL and a more readable query.  Plus every byte counts when sending SQL statements over the wire, especially when going from a local client to a remote database over an encrypted connection.  Just something to consider. Also, +1 on what James said.

  • Anonymous
    August 13, 2009
    Coming from a biztalk environment, will For XML still be supported?

  • Anonymous
    August 17, 2009
    Thanks for your comments. James, Entity Framework 4.0 will support DefaultIfEmpty, whic is used for achieving a LeftOuterJoin in Linq. Shawn, we will consider your suggestion. Thanks, Kati

  • Anonymous
    August 18, 2009
    Absolutely agree with Shawn about the shorter alias names. Shorter names are cleaner, easier to read, cut bandwidth costs, and I'd imagine it would also mean slightly faster query parsing on the server side too.

  • Anonymous
    August 27, 2009
    The comment has been removed

  • Anonymous
    September 03, 2009
    Great! Good to see you are investing a lot in were we put our future in :) "Use of Inner Joins instead of Left Outer Joins when possible" I hope this means: use inner joins when relation is 1 and use outer join when relation is 0..1. In this version the outer join does not work correctly in hierarchy classes f.e. order 0..1 to person where person is an extension of user becomes an inner join ;( It would be preferrable if the developer could force it with a keyword. Also an isnull on required fields in an outer join is really needed because ?? does not work.

  • Anonymous
    September 26, 2009

  1. It would be very helpfull if u make a menu command to refresh all tables .xml files by getting the last changes from the database instead of us going to each .xml file and refreshing each table one by one. at least for the tables that we submitted to get all the columns like: Select * from table1.
  2. Also it would be very good if there is a command that will check the tables in all the files in the Visual Studio solution and their real ones in SQL Server.
  • Anonymous
    October 03, 2009
    Hello, This simple query against AdventureWorks database using EF4.0 Beta 1 seems to produce some pretty horrible SQL – superfluous aliases everywhere, a redundant FROM clause, the same condition being tested twice etc. LINQ TO EF: bool exists = context.Customers.Any(c => c.AccountNumber == "AW00000003"); GENERATED SQL: SELECT CASE WHEN ( EXISTS (SELECT 1 AS [C1] FROM [Sales].[Customer] AS [Extent1] WHERE N'AW00000003' = [Extent1].[AccountNumber] )) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 1 AS [C1] FROM [Sales].[Customer] AS [Extent2] WHERE N'AW00000003' = [Extent2].[AccountNumber] )) THEN cast(0 as bit) END AS [C1] FROM  ( SELECT 1 AS X ) AS [SingleRowTable1] EXPECTED: SELECT CASE WHEN ( EXISTS (SELECT 1 FROM [Sales].[Customer] AS [Extent1] WHERE N'AW00000003' = [Extent1].[AccountNumber] )) THEN 1 ELSE 0 END AS [C1]

  • Anonymous
    October 03, 2009
    For TPT Inheritance, how about the ability to specify a discriminator column in the base entity to help relate to derived entities? Example: A Base entity of Vehicle would have a discriminator column called VehicleType.  Then the derived entity, Car, would be conditionally related to Vehicle when Vehicle.VehicleType = [some constant]. Seems like you could make the SQL more efficient if the join to the derived table uses the VehicleType in the condition instead of just an outer join based on the pk/fk.  It would save the DB from even scanning the derived table for the PK unless it's VehicleType matched first. Don't a lot of databases have that discriminator column anyways in this type of situation?

  • Anonymous
    October 23, 2009
    It would nice if the Linq method Contains would be included and translated to the sql IN statement: EntitySet.Where(e=>{1,2,3}.Contains(e.Id)); converted to sql: select ... where Id IN (1,2,3)

  • Anonymous
    October 28, 2009
    Will you guys deveop a full text -- Contains, freetext, etc feature in Linq to Entities? If not when will this feature be available? I still find myself using inline sql for full text.

  • Anonymous
    December 22, 2009
    Aquilax. in EF 4 you can use contains as your example shows.

  • Anonymous
    December 22, 2009
    It would be nice if you could specifiy IEnumerables as parameters to a compiled query i..e    var GetFooBar = CompiledQuery.Compile<testEntities, int, int[], IQueryable<int?>>((ctx, bar, list) =>                    from f in ctx.foos                    where f.bar == bar                    && list.Contains(f.bar.Value)                    select f.bar); When you try and execute the query it complains because int[] is not a simple type.

  • Anonymous
    October 15, 2010
    Well maybe .net 4.0 is out but for future versions, it would be very helpful if when retrieving a table field value from the generated classes and that field is a null that there is a property at the dataset to suppress the exception from raising. so we can use them anywhere instead if us always handling the the exception. FYI, in each table most of the times that most fields are not mandatory so they can have a null as a normal value.

  • Anonymous
    October 25, 2010
    The comment has been removed