Tip 28 - How to implement an Eager Loading strategy

Background:

Over the last 2 years lots of people have complained about the way Eager loading works in the Entity Framework, or rather the way you ask the Entity Framework to eagerly load.

Here is how you do it:

var results = from b in ctx.Blogs.Include(“Posts”)
where b.Owner == “Alex”
select b;

This snippets asks the EF to eager load each matching Blog’s Posts, and it works great.

The problem is the ‘Posts’ string. LINQ in general and LINQ to SQL in particular have spoilt us, we all now expect type safety everywhere, and a string, is well… not type safe.

Instead everyone wants something like this:

var results = from b in ctx.Blogs.Include(b => b.Posts)
where b.Owner == “Alex”
select b;

This is a lot safer. And a number of people have tried something like this before, including my mate Matthieu.

But even better would be something like this:

var strategy = new IncludeStrategy<Blog>();
strategy.Include(b => b.Owner);

var results = from b in strategy.ApplyTo(ctx.Blogs)
where b.Owner == “Alex”
select b;

Because here you can re-use strategies, between queries.

Design Goals:

So I decided I wanted to have a play myself and extend this idea to support strategies.

Here are the types of things I wanted to support:

var strategy = Strategy.NewStrategy<Blog>();
strategy.Include(b => b.Owner)
.Include(p => p.Comments); //sub includes
strategy.Include(b => b.Posts); //multiple includes

The ability to sub-class the strategy class

public class BlogFetchStrategy: IncludeStrategy<Blog>
{
public BlogFetchStrategy()
{
this.Include(b => b.Owner);
this.Include(b => b.Posts);
}
}

so you can do things like this:

var results = from b in new BlogFetchStrategy().ApplyTo(ctx.Blogs)
where b.Owner == “Alex”
select b;

Implementation:

Here is how I implemented this:

1) Create the IncludeStrategy<T> class:

public class IncludeStrategy<TEntity>
where TEntity : class, IEntityWithRelationships
{
private List<string> _includes = new List<string>();

    public SubInclude<TNavProp> Include<TNavProp>(
             Expression<Func<TEntity, TNavProp>> expr
) where TNavProp : class, IEntityWithRelationships
{
return new SubInclude<TNavProp>(
_includes.Add,
new IncludeExpressionVisitor(expr).NavigationProperty
);
}

    public SubInclude<TNavProp> Include<TNavProp>(
Expression<Func<TEntity, EntityCollection<TNavProp>>> expr
) where TNavProp : class, IEntityWithRelationships
{
return new SubInclude<TNavProp>(
_includes.Add,
new IncludeExpressionVisitor(expr).NavigationProperty
);
}

    public ObjectQuery<TEntity> ApplyTo(ObjectQuery<TEntity> query)
{
var localQuery = query;
foreach (var include in _includes)
{
localQuery = localQuery.Include(include);
}
return localQuery;
}
}

Notice that there is a list of strings that holds the Includes we want. And notice that the ApplyTo(…) method allows you to register the Includes with an ObjectQuery<T>, so long as the T’s match.

But of course the bulk of the work is in the two Include(..) methods.

There are two because I wanted to have one for including References and one for including Collections. This implementations are designed to work with .NET 3.5 SP1 so I can rely on classes that have relationships (the only type for which Include makes sense) implementing IEntityWithRelationships. Hence the use of generic constraints.

One thing that is interesting is that for the Include method for Collections, even though the Expression is Expression<Func<TEntity, EntityCollection<TNavProp>>> the return object for creating sub-includes is typed to TNavProp. This is allows us to neatly bypass needing to interpret expressions like this:

Include(b => b.Posts.SelectMany(p => p.Author));

or invent some sort of DSL like this:

Include(b => b.Posts.And().Author);

By instead doing this:

Include(b => b.Posts).Include(p => p.Author);

Which is much much easier to implement, and I would argue to use too.

This idea is central to the whole design.

2) The IncludeExpressionVisitor is a class derived from a copy of the ExpressionVisitor sample you can find here. It is very simple, in fact it is so simple it is probably overkill to use a visitor here, but I wanted to bone up on the correct patterns etc:

public class IncludeExpressionVisitor : ExpressionVisitor
{
private string _navigationProperty = null;

    public IncludeExpressionVisitor(Expression expr)
{
base.Visit(expr);
}
public string NavigationProperty
{
get { return _navigationProperty; }
}

    protected override Expression VisitMemberAccess(
MemberExpression m
)
{
PropertyInfo pinfo = m.Member as PropertyInfo;

        if (pinfo == null)
throw new Exception(
"You can only include Properties");

if (m.Expression.NodeType != ExpressionType.Parameter)
throw new Exception(
"You can only include Properties of the Expression Parameter");

_navigationProperty = pinfo.Name;

        return m;
}

    protected override Expression Visit(Expression exp)
{
if (exp == null)
return exp;
switch (exp.NodeType)
{
case ExpressionType.MemberAccess:
return this.VisitMemberAccess(
(MemberExpression)exp
);
case ExpressionType.Lambda:
return this.VisitLambda((LambdaExpression)exp);
default:
throw new InvalidOperationException(
"Unsupported Expression");
}
}
}

As you can see this visitor is fairly constrained, it only recognizes LambdaExpressions and MemberExpressions. When visiting a MemberExpression it checks to make sure that the Member being access is a Property, and that the member is bound directly to the parameter (i.e. p.Property is okay but p.Property.SubProperty is not). Once it is happy it records the name of the NavigationProperty.

3) Once we know the NavigationProperty name the IncludeStrategy.Include methods create a SubInclude<T> object. This is responsible for registering our intent to include the NavigationProperty, and provides a mechanism for chaining more sub-includes.

The SubInclude<T> class looks like this:

public class SubInclude<TNavProp>
where TNavProp : class, IEntityWithRelationships
{

    private Action<string> _callBack;
private string[] _paths;

internal SubInclude(Action<string> callBack, params string[] path)
{
_callBack = callBack;
_paths = path;
_callBack(string.Join(".", _paths));
}

    public SubInclude<TNextNavProp> Include<TNextNavProp>(
Expression<Func<TNavProp, TNextNavProp>> expr
) where TNextNavProp : class, IEntityWithRelationships
{
string[] allpaths = _paths.Append(
new IncludeExpressionVisitor(expr).NavigationProperty
);

return new SubInclude<TNextNavProp>(_callBack, allpaths);
}

    public SubInclude<TNextNavProp> Include<TNextNavProp>(
Expression<Func<TNavProp, EntityCollection<TNextNavProp>>> expr
) where TNextNavProp : class, IEntityWithRelationships
{
string[] allpaths = _paths.Append(
new IncludeExpressionVisitor(expr).NavigationProperty
);

return new SubInclude<TNextNavProp>(_callBack, allpaths);
}
}

4) Now the only thing missing is a little extension method I wrote to append another element to an array, that looks something like this:

public static T[] Append<T>(this T[] initial, T additional)
{
List<T> list = new List<T>(initial);
list.Add(additional);
return list.ToArray();
}

With this code in place you can write your own eager loading strategy classes very easily, simply by deriving from IncludeStrategy<T>.

All the code you need is in this post, but please bear in mind this is just a sample, it NOT an official Microsoft release, and as such has not been rigorously tested etc.

If you accept that I'm just a Program Manager, and I'm eminently fallible, and you *still* want to try this out, you can download a copy of the source here.

Enjoy.

EagerLoading.zip

Comments

  • Anonymous
    July 24, 2009
    Nice! I gave something similar a try a while ago but I must admit I failed big time on the SubIncludes! Going to try yours out. Many have tried and semi-failed before you :)
  • Anonymous
    July 25, 2009
    Interesting impelementation! I will have a deeper look at what you provided here. Nice idea by the way.
  • Anonymous
    July 26, 2009
    I would love to see something along these lines baked into the framework.  Avoiding "magic strings" and allowing re-use is definitely a Good Thing.
  • Anonymous
    July 26, 2009
    The comment has been removed
  • Anonymous
    July 26, 2009
    Please keep up the cool tips. EF is going great, no matter what some peeps say. I haven't used anything this powerful since I learned to count SQL group by havings. Thanks!
  • Anonymous
    August 04, 2009
    Wow,this is perfect. This is the most elegant solution I have seen so far.Any clue if this will work in next version of Entity framework?Thanks
  • Anonymous
    August 04, 2009
    @Daniel,If you use default code-gen (i.e. classes that derive from EntityObject) this will work in 4.0 too.However if you write POCO classes, you'll have to make a few changes to the API, but the principle will still work.The key will be to remove the IEntityWithRelationship and EntityCollection from the generic method constraints.Alex
  • Anonymous
    August 12, 2009
    Hi Alex,I was looking for something similar but,can you please help me how can I do something likeDataEnitites de = new DataEntities();var query = de['TableName'].Select(col1,Table2.Col1);This is reqired because the entity name and the field will be know at runtime.I tried doing his......I have added this Indexer Prpoperty to the main EntityContext Class which returns me the table which I want to query.public global::System.Data.Objects.ObjectQuery<System.Data.Objects.DataClasses.EntityObject> this[string indexer]       {           get           {               string entity = "[" + indexer + "]";               return base.CreateQuery<System.Data.Objects.DataClasses.EntityObject>(entity);                         }       }With this entity I can hard code the columns in Select Method, something likevar query = dce["Table2"].Select("col2");        But I want that the Select Method accepts the entity objects which are having relationship so that Entity Framework can do join based query for me automatically.Please help me on this Alex.Thanks a lot.
  • Anonymous
    August 13, 2009
    Ashish,I was following your question right up to then end, but you lost me with this:"But I want that the Select Method accepts the entity objects which are having relationship so that Entity Framework can do join based query for me automatically."Can you give me an example of the code you want to write? It would really help me. Also if you can use an example model in your code snippets it will really help me understand what you want i.e. Person.Mother.-Alex
  • Anonymous
    August 20, 2009
    Great blog, man.I have a question on stackoverflow that nobody could answer, related to include and recursive hierarchies, maybe you could help me ;)http://stackoverflow.com/questions/1308158thanks!
  • Anonymous
    August 26, 2009
    This would be great, waiting for it
  • Anonymous
    September 02, 2009
    Hi,I only have the ID of the object to be deleted. I want to issue an UPDATE stmt on the DB so that the is_deleted field in the object gets marked as true, thereby soft-deleting the object. I could do :c = context.Customers.Where(Customer => Customer.id==123) ;context.Attach(c) ;c.is_deleted = true ;context.SaveChanges() ;But this would mean firing a SELECT stmt and then an UPDATE that updates ALL the columns.What I would like to do is fire an UPDATE on a single column.I use .NET 3.5 sp1.Regards,Yash
  • Anonymous
    September 02, 2009
    Yash,Well it is occasionally possible to do updates without getting all the fields check out Tip 15, for moreAlex
  • Anonymous
    September 03, 2009
    The comment has been removed
  • Anonymous
    September 04, 2009
    Yash,Yes you are recommending a strategy used by some ORMs. It is definitely something we should consider for future versions of the Entity Framework.The complication is that streaming nature of LINQ, i.e. you would have to load all datatables before yielding any entities. Which is somewhat counter to the idea of IEnumerable<> and IQueryable<>CheersAlex
  • Anonymous
    August 17, 2010
    I tried smth similar and saw it broken when trying to use compiled queries. Can you advise where to dig to have it working in compiled linq as well?
  • Anonymous
    August 25, 2010
    nayato - sorry there is nothing special you can do to make this work with CompiledQuery... short of adding surfact to collect all the Include strings from the strategy and apply them when constructing the CompiledQuery.Alex
  • Anonymous
    September 16, 2010
    Thought I'd share two helpful extension methods I wrote. One wraps the "ApplyTo" method back into "Include". The other is to make it quick-n-simple when you only need to include one property in your query.public static ObjectQuery<TEntity> Include<TEntity>(this ObjectQuery<TEntity> query, IncludeStrategy<TEntity> includeStrategy)   where TEntity : class, IEntityWithRelationships{   return includeStrategy.ApplyTo(query);}public static ObjectQuery<TEntity> Include<TEntity, TNavProp>(this ObjectQuery<TEntity> query, Expression<Func<TEntity, EntityCollection<TNavProp>>> expr)   where TEntity : class, IEntityWithRelationships   where TNavProp : class, IEntityWithRelationships{   var strategy = new IncludeStrategy<TEntity>();   strategy.Include(expr);   return query.Include(strategy);}
  • Anonymous
    March 22, 2011
    HiHow can i use your strategy to implement something like this?select customer.*, order.id, orderdetails.productidwhere customer.id = order.customeridand order.id = orderdetails.orderidand orderdetails.productid in(select products.id from supplier, productswhere supplier.id = products.supplieridand supplier.name = "SomeName")Also how to extend your strategy to implement paging functionality?
  • Anonymous
    June 08, 2011
    The comment has been removed