DLR Trees (Part 1)

I'd like to take a detour from discussing the DLR's type system to start talking about implementation. I hope that this will make it easier to talk about some difficult questions like what is the right meta-model for the DLR type system. Let's start by compiling the traditional factorial function for a DLR-based language - using JavaScript for today.

 function factorial(n) {
    if (n <= 1) {
        return 1;
    } else {
        return n * fact(n - 1);
    }
}

The JavaScript compiler will take this code and compile it into the following tree form and pass the tree to the DLR. 

The key implementation trick in the DLR is using these kinds of trees to pass code around as data and to keep code in an easily analyzable and mutable form as long as possible. Anyone who's programmed in Lisp or Scheme knows this trick extremely well. Much more recently, this idea has resurfaced as one of the central features in C# 3 and VB.NET 9 that enable LINQ. Linq uses these "expression trees" to capture the semantics of a given block of code in a sufficiently abstract way that it can be optimized to run in different contexts. For example, the DLINQ SQL interfaces can convert that code into a SQL query that will be optimized by the database engine on the back-end. More radical projects like PLINQ are exploring how to take these same kinds of trees and rearrange the code to execute in parallel and on multiple cores when possible.

When we started building the DLR we knew that we wanted a shared tree form, but we believed that we wanted purely untyped and late-bound nodes in the tree that would always use dynamic actions for their operations. This matched what we had for IronPython where the only typed node in the tree was the ConstantExpression - which had the type of whatever its constant value was. Because our trees were untyped, we didn't believe that they shared much in common with the always strongly typed expression trees in C# 3 and we went about designing these trees as something completely independent.

As we started adding more language implementations, we found that it was difficult to capture different languages idiosyncratic concepts in the DLR tree. Examples of this range from Python's print statement to JavaScript's regular expression literals. We needed to support these language features, but we couldn't add explicit support for them into our trees.  Well, clearly we could have added explicit support for them in the tree, but where would that have ended? Ultimately, if the DLR is going to be successful it will have to be general enough that language implementations can represent their own concepts clearly even if there is no direct DLR equivalent.

For Python's print statement, we knew that IronPython already compiled this into a call to a static method that handled all of the implementation details of printing in Python - including the dreaded softspace. We could support this by adding a static method call node to the DLR tree and by converting the Python print statement into a static method call to this helper method. Similarly, when we went to support the special Python constant for an ellipsis, "...", we knew that the existing IronPython implementation compiled this into a static field access of the singleton ellipsis instance. We could support this by adding a static field access node to the DLR tree and converting the Python ellipsis instance into one of these nodes.

Pretty soon we realized that the nodes that we were adding mapped directly onto the statically typed and bound expression tree nodes from C# 3. This was a good thing, since it meant that there was less for us to create from scratch and more opportunity for us to steal from existing ideas. The DLR trees today in theory include all of the expression tree nodes from C# 3 and add to them several additional kinds of nodes to handle dynamic operations, variables and name binding, and control flow. If you look at the code you'll notice that in fact we only support a subset of the C# 3 nodes and that our APIs don't exactly match the existing expression trees. This is simply an artifact of our late realization that these trees were so closely related and we're now in the process of reconciling our trees with the existing expression trees.

The value of having dynamic languages target this tree form is that we can perform lots of optimizations in the DLR layer on behalf of the language implementations. Some of these optimizations include variable storage where we can often optimize local variables as CLR local variables that can be JITed into machine registers - but where we can also detect explicit meta-programming such as Python's locals() function or JavaScript's arguments feature and fall back to slower and more explicit storage mechanisms such as tuples or object fields when needed. Similarly for the dynamic action nodes we can use different implementation techniques to optimize these important but potentially slow operations.

This post is already too long and it's been over a week since my last post.  I'm going to finish up for today and save any more details about the optimizations we perform on these trees for the future. For those of you looking at code, the DLR trees can be found in Microsoft.Scripting.Ast.

Comments

  • Anonymous
    May 15, 2007
    How do you guys think about DLR trees vs something like System.Expressions?

  • Anonymous
    May 15, 2007
    That is indeed a powerful technique. I use a similar one for Dejavu/Geniusql to translate Python expressions into SQL. Would your DLR implementation be easier if Python had a first-class "expression tree" object? I'd love to work on a PEP for such a beast. I'm stuck passing lambdas into my Expression objects for the moment (although I expect to support genexps soon for the common case of relation-attribute-restriction 3-tuples like SQL SELECT uses). And having a simple symbolic syntax instead of the "lambda" keyword would make that kind of thing much more readable...

  • Anonymous
    May 15, 2007
    small typo in your code, should be factorial(n-1) not fact(n-1) oh i forgot dynamic-language guys enjoy finding errors at runtime... keeps things interesting   :)

  • Anonymous
    May 15, 2007
    The comment has been removed

  • Anonymous
    May 15, 2007
    I am sure MS has patents on the DLR, I wonder if they will indeed start suing people who use LISP or run programs written in LISP. Actually a more interesting question is whether MS will sue you if you write an open source application using the DLR.

  • Anonymous
    May 16, 2007
    The comment has been removed

  • Anonymous
    May 16, 2007
    Y'know, using a comment to point out a similarity between "code as tree" and Lisp isn't at all clever or insightful when the poster himself wrote "Anyone who's programmed in Lisp or Scheme knows this trick extremely well." Not to mention that parsing code into an expression tree is the front end of pretty much every compiler since, I don't know, the introduction of BNF... One would (strenuously) hope any associated patents would be shouted down by the overabundance of prior art.

  • Anonymous
    May 21, 2007
    The comment has been removed

  • Anonymous
    May 23, 2007
    Hi Jim, from the sample you posted it looks that DLR treats expressions and statements as different types, so I'm wondering if your AST would be able to represent following F# code: let a = 1 +  if (b=c) then    print_string "side effect 1!";    3  else    print_string "side effect 2!";    4; In F# everything is an expression, so you can embed statements anywhere in the tree. You can write code with same semantics in JavaScript, so there will be some way for representing it, but it's quite an ugly trick ;-): var a = (b==c)?  (new function() {    print_string "side effect 1!";    return 3;  })():(new function() {    print_string "side effect 2!";    return 4;  })() Tomas

  • Anonymous
    June 02, 2007
    I'm waiting for Part 2 .........

  • Anonymous
    September 16, 2008
    Gracias a la gente de Microsoft de Argentina, en especial al bueno de Miguel Saez , tendré el gran gusto

  • Anonymous
    December 02, 2008
    You have to love a hotel where you have the amazing Lavazza coffee in your hotel room. CLR/Silverlight/Orcas/C# One of the things that has been a bit fustrating over the years with the CLR is the limitation of one CLR per process. Specifically, where