Lambda Expressions vs. Anonymous Methods, Part Four
Hey all, sorry for the long time between posts; I have been crazy busy recruiting, interviewing, fixing bugs, making performance improvements and implementing last-minute changes to the language and expression tree library. The last few posts about lambda binding yielded many interesting comments which I hope to address over the next month or so.
Before that though -- back when I started this series of posts I noted that lambda expressions have an interesting property. Namely, the semantic analysis of the lambda body may depend upon what type the lambda is being converted to. This seemingly trivial fact has an important impact upon the compiler performance during analysis. At long last, let me demonstrate the impact this fact has.
Consider the following set of overloads:
int M(Func<int, int> f){ /* whatever ... */ }
string M(Func<string, string> f){ /* whatever ... */ }
Now suppose we have a call with a lambda argument:
M(x=>x.ToUpper());
During overload resolution we make a list of all the potential overloads and we see which are applicable. That is, which overloads have the property that all of the arguments are convertible to the types of their corresponding formal parameters. Then of all the applicable methods, we pick the best match if there happens to be more than one.
Therefore we must try this binding two ways. We try it as though it were
M((int x)=>x.ToUpper());
and
M((string x)=>x.ToUpper());
The former produces an error while binding the body because there is no ToUpper method on int. The latter binds the body correctly and the return type matches the expected return type, so this one succeeds. The first M is not applicable so it is discarded, the second one is applicable, so it is chosen, all is good in the world.
Or is it?
What if we did something crazy like this?
M(x1=>M(x2=>x1.ToLower() + x2.ToUpper());
OK, so now what do we do? We try to do overload resolution on the outer call. Again, we must try both int and string as candidates for x1. But then when we get to the outer body, we are faced with the same problem for x2.
We have to try all four possibilities for {x1, x2} (that is, {int, int}, {int, string}, {string, int} and {string, string}) in order to determine how many of them work, and of the ones that work, which one is the best. In this particular example "both are string" is the only one that works. (We could with a little more cleverness come up with examples where there were multiple possible matches that worked.)
So perhaps you see where this is going. If there are m overloads that are nested in this manner n deep, then the compiler must try mn possibilities in order to determine which is the unique best one. If it's 22 = 4, that's not a big deal. If we triple each and go to 66 = 46656, that's rather a lot of binding to do, and it takes up rather a lot of time and memory.
Does this seem like a contrived example? Unfortunately, it is not. In the original prototype of C# 3.0, nested from clauses in queries were translated into nested lambdas in exactly this manner, and most of the translation methods have at least two overloads, some have many more. We were easily getting into situations where simple queries could involve millions of body bindings. Fortunately we realized the problem in time; my colleague Wes came up with an alternative translation strategy which does not generate nested lambdas so aggressively and we therefore avoid this problem in the common case. You can still get into it yourself though; please do not.
Next time I'll create a truly contrived (and rather hilarious) example which demonstrates just how bad things can get and just how hard you can make the compiler work to do overload resolution on lambdas.
Comments
Anonymous
March 26, 2007
Extension methods are another potential source for adding potential signatures. I assume that lambda expressions take extension methods into account when resolving signatures. For example, Add a ToUpper to In32 (whatever that may mean...) and M((int x)=>x.ToUpper()); should resolve fine.Anonymous
March 26, 2007
Sure. Almost all of the rules of semantic analysis are the same inside a lambda body as in a normal method.Anonymous
March 27, 2007
If the compiler has a choice between an extension method and a native method when trying to resolve an overload, which one will it pick? Or will it throw an exception? Let's use the example that "Name (required)" provided and say that M((int x)=>x.ToUpper()); resolves because we've created a ToUpper() method that takes and returns an Integer. What happens?Anonymous
March 27, 2007
If there are two or more applicable instance methods and we cannot determine which is better, we do NOT start looking for instance methods. We give an error as usual. If there are one or more applicable instance methods and we can determine which is best, we bind to the best instance method. Otherwise, there must be zero applicable instance methods. We look for extension methods. The rules for how we look for extension methods are complicated and I can go into them in detail if you would like. Suffice to say that we start looking in the current namespace for an extension method with the right name. If we can't find one then we step out one level of namespace, etc.Anonymous
March 27, 2007
The comment has been removedAnonymous
March 28, 2007
Thanks for the insight, Eric. Your post was extremely informative as usual. I'm a bit surprised that you don't favor instance methods over extension methods as a way to disambiguate method calls. I can understand the decision, but might it have been better to favor the instance method and give a compiler warning about the ambiguity? This would avoid unforseen breaking changes due to the introduction of an extension method. Was there a debate over this sort of thing on the compiler team? Is it your opinion that any ambiguity is bad enough that the code should not compile?Anonymous
March 28, 2007
We DO favour instance methods over extension methods when disambiguating method calls. I think I was insufficiently clear in my description above. What we need to do is determine whether the int or the string version of M (or both) is applicable. Clearly the string version is a candidate. If there is an extension method on int such that the int version of the lambda can be successfully bound, then it is a candidate too. At this point we now need to decide which overload is better, and in this case neither is better. We do not consider any of the reasons why a particular body bound successfully when deciding which is better. We do not say "well, this body bound with twelve extension methods, but this other version of the body bound with only six extension methods, so the version with six is better". "Betterness" is determined solely by examining the types and the conversions, not any details of why the conversion succeeded. However, if we are in a situation where we have to choose between an instance method and an extension method, it is not even a choice. If there is ANY applicable instance method, we either choose it or produce an ambiguity error. We NEVER choose an applicable extension method over an applicable instance method.Anonymous
September 06, 2007
Last time I demonstrated that the compiler could have to do an exponential number of bindings in orderAnonymous
June 15, 2008
NOTE: If you haven't read the first post in this series, I would encourage you do to that first