Generated SQL Improvements for TPT Queries (June CTP)

 


The information in this post is out of date.

Visit msdn.com/data/ef for the latest information on current and past releases of EF.


 

Last year we wrote about some performance considerations when using TPT inheritance in the Entity Framework.  We are pleased to announce that with the Microsoft Entity Framework June 2011 CTP we have released the first round of improvements, resulting in dramatic improvements in queries against TPT hierarchies.  While there is still room for improvement, we have completed some of the most difficult changes.  In many cases we are able to exclude tables from the generated query that do not contribute to the result.  Furthermore, many of the UNION ALL occurrences have been replaced with LEFT OUTER JOINs or eliminated completely, resulting in generated SQL that closely resembles what a developer might handcraft.

For example, this model was used in the original blog post:

 

 

 

Two sample queries were used in that blog post – a simple projection against properties exclusively on the base type and an even simpler query that selects all of the guests.  Let’s look at the effect of the improvements on those two queries:

 

 

var query = from g in db.Guests
select new { Id = g.GuestId, Name = g.Name };

 

Before

SELECT

[Extent1].[GuestId] AS [GuestId],

[Extent1].[Name] AS [Name]

FROM  [dbo].[Guests] AS [Extent1]

LEFT OUTER JOIN  (SELECT

       [Extent2].[GuestId] AS [GuestId]

       FROM [dbo].[Guests_USGuest] AS [Extent2]

UNION ALL

       SELECT

       [Extent3].[GuestId] AS [GuestId]

       FROM [dbo].[Guests_UKGuest] AS [Extent3]) AS [UnionAll1] ON [Extent1].[GuestId] = [UnionAll1].[GuestId]

After

SELECT

[Extent1].[GuestId] AS [GuestId],

[Extent1].[Name] AS [Name]

FROM [dbo].[Guests] AS [Extent1]

 

In this query we removed an unnecessary left outer join and a union all, resulting in a query that is 33% of the original size and much easier to read.

 

var query = db.Guests;

Before

SELECT

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1],

[Extent1].[GuestId] AS [GuestId],

[Extent1].[Name] AS [Name],

[Extent1].[Address] AS [Address],

[Extent1].[City] AS [City],

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[State] END AS [C2],

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[Zip] END AS [C3],

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[PhoneNumber] END AS [C4],

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll1].[C1] END AS [C5],

CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll1].[C2] END AS [C6]

FROM  [dbo].[Guests] AS [Extent1]

LEFT OUTER JOIN  (SELECT

      [Extent2].[GuestId] AS [GuestId],

      [Extent2].[State] AS [State],

      [Extent2].[Zip] AS [Zip],

      [Extent2].[PhoneNumber] AS [PhoneNumber],

      CAST(NULL AS varchar(1)) AS [C1],

      CAST(NULL AS varchar(1)) AS [C2],

      cast(1 as bit) AS [C3],

      cast(0 as bit) AS [C4]

      FROM [dbo].[Guests_USGuest] AS [Extent2]

UNION ALL

      SELECT

      [Extent3].[GuestId] AS [GuestId],

      CAST(NULL AS varchar(1)) AS [C1],

      CAST(NULL AS varchar(1)) AS [C2],

      CAST(NULL AS varchar(1)) AS [C3],

      [Extent3].[PostalCode] AS [PostalCode],

      [Extent3].[PhoneNumber] AS [PhoneNumber],

      cast(0 as bit) AS [C4],

      cast(1 as bit) AS [C5]

      FROM [dbo].[Guests_UKGuest] AS [Extent3]) AS [UnionAll1] ON [Extent1].[GuestId] = [UnionAll1].[GuestId]

After

SELECT

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1],

[Extent1].[GuestId] AS [GuestId],

[Extent1].[Name] AS [Name],

[Extent1].[Address] AS [Address],

[Extent1].[City] AS [City],

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN [Project2].[PostalCode] END AS [C2],

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN [Project2].[PhoneNumber] END AS [C3],

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project1].[State] END AS [C4],

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project1].[Zip] END AS [C5],

CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project1].[PhoneNumber] END AS [C6]

FROM   [dbo].[Guests] AS [Extent1]

LEFT OUTER JOIN  (SELECT

      [Extent2].[State] AS [State],

      [Extent2].[Zip] AS [Zip],

      [Extent2].[PhoneNumber] AS [PhoneNumber],

      [Extent2].[GuestId] AS [GuestId],

      cast(1 as bit) AS [C1]

      FROM [dbo].[Guests_USGuest] AS [Extent2] ) AS [Project1] ON [Extent1].[GuestId] = [Project1].[GuestId]

LEFT OUTER JOIN  (SELECT

      [Extent3].[PostalCode] AS [PostalCode],

      [Extent3].[PhoneNumber] AS [PhoneNumber],

      [Extent3].[GuestId] AS [GuestId],

      cast(1 as bit) AS [C1]

      FROM [dbo].[Guests_UKGuest] AS [Extent3] ) AS [Project2] ON [Extent1].[GuestId] = [Project2].[GuestId]

 

At first glance this query may not show much improvement, however, the UNION ALL has been replaced by a LEFT OUTER JOIN.  Using a LEFT OUTER JOIN rather than a UNION ALL allows tables that don’t contribute to the result to be removed from the query entirely (as in the first example).

 

Query Improvements Summary

In this round of improvements we focused on restricting generated SQL to tables that actually contribute to the query and replacing UNION ALL with LEFT OUTER JOIN.  As mentioned earlier these were some of the most difficult challenges.

 

Conclusion

We are excited to be able to deliver these improvements as part of the Microsoft Entity Framework June 2011 CTP and we look forward to continued efforts in improving generated SQL.  As always we encourage and appreciate your feedback on these features.  What can we do better?  Are there other notable areas where SQL optimization could significantly improve the Entity Framework?  Please leave your thoughts and comments below.

 

ADO.NET Entity Framework Team

Comments

  • Anonymous
    July 25, 2011
    Performance is the thing that absolutely matters most when an application needs to handle a lot of load. I believe the EF team realizes that they will run into a bunch of edge cases (not specific to this post) so keep on improving and rethinking the generated SQL code. The focus on generated SQL code along with the necessary upcoming features are the things that will make EF a valid number one choice when choosing an O/RM for a big application. I must say that I really like the way EF is evolving. Congrats to the team!

  • Anonymous
    July 26, 2011
    The comment has been removed

  • Anonymous
    July 28, 2011
    @Vijay, can you post a sample query or point us to a forum post?

  • Anonymous
    July 31, 2011
    We use the Include method to eagerly load child entities. The issue is that EF still creates a lot of UNIONs and testing the EF-generated query that against a manually (and an equivalent) created query, it falls way far behind in terms of performance. Because of that we decided to use stored procedures to retrieve our desired model. However we don't know how to do it (load parent and child entities) using the stored procedure approach without hacking around. We are thinking there is an easier way to do it with Code-first as it can be done with Model first appraoch as shown here (blogs.msdn.com/.../walkthrough-multiple-result-sets-from-stored-procedure.aspx).

  • Anonymous
    August 09, 2011
    In EF4 (and AFAIK EF4.1 as well) if I have two tables (in my case Users and Customers) in a 1-0..1 relation and in a LINQ query on context.Users select a few columns from the Users table and several from the Customers (containing some additional info that's only set for some users) the resulting query references the Customers table several times. As many times as there are columns selected. That's insanely inefficient! See social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/4bd7f23f-730c-496a-b651-76a349a60b65 Has this been fixed in the current CTP? IMNSHO whenever there's a ... dbo.tblFoo as [ExtentX] JOIN dbo.tblFoo as [ExtentY] ON [ExtentY].PrimaryKey = [ExtentX].PrimaryKey ... in the generated query, then one of the extents should be removed and all references to it should be updated to refer to the other one.

  • Anonymous
    August 15, 2011
    Great news that TPT queries are improved. Still, the SQL of the second example is pretty complex. Why doesn't the query look like: SELECT  this_0_.Id AS Id0,  this_0_.Name AS Name0,  this_0_.Address AS Address0,  this_0_.City AS City0,  this_1_.PostalCode AS PostalCode1,  this_1_.PhoneNumber AS PhoneNumber1,  this_2_.State AS State2,this_2_.Zip AS Zip2,  this_2_.PhoneNumber AS PhoneNumber2,  CASE     WHEN this_2_.Id is not null THEN 2     WHEN this_1_.Id is not null THEN 1  END AS type_0 FROM Guests AS this_0_ left outer join Guests_UKGuest AS this_1_ on this_0_.Id=this_1_.Id left outer join Guests_USGuest AS this_2_ on this_0_.Id=this_2_.Id This is much simpler and executes much faster and is also more like NHibernate (if I'm correct). I'm using these kind of queries myself and they work perfect. I'm really interested in understanding why EF doesn't generate queries like this.

  • Anonymous
    August 19, 2011
    Hi If I run this SQL on a client PC; SELECT Applicants.ApplicantID AS Expr1, Applicants.last_name, ApplicantNotes.* FROM Applicants INNER JOIN ApplicantNotes ON Applicants.ApplicantID = ApplicantNotes.ApplicantID it executes the query and returns 3038 records in <=1 seconds. If however I use a similar LINQ in my app as below; Dim Query = From a In Context.Applicants Select a.ApplicantID, a.last_name, a.ApplicantNotes it takes many time longer, in the order of around 1 minute or so. This is proving inconvenient from a user experience point of view. The worst part is that a legacy MS Access app running on the same LAN runs multiple complex queries and returns results from many thousands of records in just a few seconds. This is making it difficult to sell SQL Server app to the client using EF. I can really use some expert insight here to speed up things in this rather simple LINQ scenario.  Apart from the generic advice in this article msdn.microsoft.com/.../cc853327.aspx if any one can give some specific advice for my situation on how to speed up above LINQ, it would be highly appreciated. Loading a single entity via LINQ is much faster. Its only when a related entity is included in LINQ that it gets painfully slow. Many Thanks Regards

  • Anonymous
    August 29, 2011
    I've posted the article "Improving Entity Framework Query Performance Using Graph-Based Querying" on CodeProject. It is available at: www.codeproject.com/.../Improve-Entity-Framework.aspx Amongst others, it shows that performance of TPT queries of the June 2011 CTP release is still far from optimal for non-trivial inheritance hierarchies.

  • Anonymous
    February 02, 2012
    The comment has been removed