Every Tree There Is

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

BinaryTrees1Last time we talked about how the number of binary trees with n nodes is C(n), where C(n) is the nth Catalan number. I asked if there were more or fewer trees – not restricted to binary trees – of size n than there are binary trees of size n. If you worked it out, the answer might have surprised you; it is certainly not immediately obvious.BinaryTrees2

First off, a common response I get to this question is immediately "well, since binary trees are a special case of arbitrary trees, there must be more arbitrary trees." Can you see why this argument is wrong? Binary trees have more structure than arbitrary trees! There are two binary trees with two nodes: one with the child on the right of the root, one with the child on the left of the root. There is only one arbitrary tree with two nodes; there is no difference between the "left" and "right" child.

As it turns out, the answer to my question is that except for the trivial cases of zero- and one-node trees, there are always more binary trees of size n than trees of size n.

If you actually work it out though you find an interesting pattern; the number of arbitrary trees of size n is C(n-1) – that is, there are 14 binary trees of size 4, and 14 arbitrary trees of size 5.  There are 1430 binary trees of size 8 and 1430 arbitrary trees of size 9. Weird! (Or, another way of looking at it is that the number of binary trees with n "null leaf nodes" is equal to the number of arbitrary trees with n nodes.)

Clearly this cannot be a coincidence; there must be a one-to-one relationship between the set of binary trees of size n and that of arbitrary trees of size n+1. And in fact the mapping is easy to see if you look at it in just the right way. On the left I’ve drawn a mapping from the first five binary trees of size 4 enumerated by my code of last time and the first five corresponding trees of size 5. (Click on the picture for a larger version.) 

To make it more clear exactly what the relationship is between the two trees, check out the diagram on the right. To turn a binary tree into its corresponding arbitrary tree, you rotate it 45 degrees anticlockwise, stick a root node above the whole thing, and then fix up all the horizontal lines to point to the appropriate parent.

Or, from a data structures point of view, you can represent every node in an arbitrary tree as a reference to the first child and a reference to the next sibling. This is exactly the same as a binary tree, it's just that the labels on the nodes are "first child" and "next sibling", not "left" and "right". The only difference between binary trees and arbitrary trees in this system is that the "right" child of the root is always null, since the root has no siblings. Once you see that the difference between a binary tree and an arbitrary tree is just the names of the fields in the data structure, it becomes apparent that there is a strong relationship between them.

So, interestingly enough, the solution to the second part of my challenge is: we’re already done. Since we have a device which enumerates all binary trees, and binary trees are 1-1 with arbitrary trees, then all we need is a device which can change binary trees to arbitrary trees. Doing so is left as an exercise, but for interest’s sake, here’s a device which takes a binary tree and gives a string representation of the corresponding arbitrary tree, using the compact syntax for an arbitrary tree I described last time.

public static string TreeString(Node node)
{
// Get the arbitrary tree string corresponding to
// a given binary tree.
var sb = new StringBuilder();
Action<Node> f = null;
f = n =>
{
sb.Append("{");
for (Node child = n.Left; child != null; child = child.Right)
f(child);
sb.Append("}");
};
f(new Node(node, null));
return sb.ToString();
}

To get all the trees of size 5, we just ask for all the binary trees of size 4 and print them as trees of size 5:

foreach (var n in AllBinaryTrees(4))
Console.WriteLine(TreeString(n));

and we get

{{}{}{}{}}
{{}{}{{}}}
{{}{{}}{}}
{{}{{}{}}}
{{}{{{}}}}
{{{}}{}{}}
{{{}}{{}}}
{{{}{}}{}}
{{{{}}}{}}
{{{}{}{}}}
{{{}{{}}}}
{{{{}}{}}}
{{{{}{}}}}
{{{{{}}}}}

Notice that if you remove the outer set of braces on each then we have also solved the problem of “print all the legal combinations of four sets of matched braces”! If you can enumerate all the binary trees then it turns out you can enumerate all the solutions to dozens of different equivalent problems.

Next time: what else can we generate all of?

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

Comments

  • Anonymous
    April 22, 2010
    I think an interesting variation on this topic (and the last one) would be to modify the algorithm to generate all "structurally" unique trees - excluding those that contain reflections of nodes within the structure. There are far fewer unique trees than binary trees. For example, for N=32, there are 55,534,064,877,048,198 binary trees but only 18,632,325,319 unique binary trees. FInding a way to produce unique trees in a streaming manner (not storing prior state) would be an interesting challenge ... and I'm not even sure it's possible.

  • Anonymous
    April 22, 2010
    Thanks for the interesting post as usual.  Below is my version of converting Binary Tree to Arbitrary Tree: static TreeNode BinaryTreeToTree(Node binary) {    return new TreeNode(BinaryNodeToTreeChildren(binary)); } static IEnumerable<TreeNode> BinaryNodeToTreeChildren(Node binary) {    for (Node right = binary; right != null; right = right.Right)        yield return BinaryTreeToTree(right.Left); }

  • Anonymous
    April 22, 2010
    I think this could easily be extended to generate binary expressions (binary trees), and probably for variadic expressions (arbitrary trees) as well.  This would of course be a handy tool to test portions of a compiler...

  • Anonymous
    April 22, 2010
    I think the idea of "since binary trees are a special case of arbitrary trees, then there must be more arbitrary trees" is including the binary trees in the list of arbitrary trees. In other words, if all binary trees are trees--and there are trees that are not binary trees--then there are more (general) trees than binary trees.

  • Anonymous
    April 22, 2010
    This brilliant way of seeing the solution was really humbling for me. I couldn't see this elegant solution myself if my life depended on it ;)

  • Anonymous
    April 22, 2010
    "Next time: what else can we generate all of?" Arbitrary trees with ordered nodes?

  • Anonymous
    April 22, 2010
    The comment has been removed

  • Anonymous
    April 22, 2010
    "Next time: what else can we generate all of?" Maybe sentences/syntax trees of a context free language? :) Maybe! -- Eric

  • Anonymous
    April 22, 2010
    A bit of an irrelevant question: This algorithm places the string representations, at least for the 4/5 case, very nearly in reversed lexical order (with one item two positions out of place) - and where they are out of place it is in the same way for both of them. So my question is: Do these string representations always guarantee that the binary tree and the arbitrary tree's string representation will be in the same place in their respective lexical order* And is there an algorithm that will always generate them in lexical order (other than the obvious case of generating the string representations via a 'what characters are valid in the next position' method and trying them in lexical order) *assuming { comes before }, and that ( comes before ) which both come before x. Interesting fact 2: the arbitrary tree representation can be obtained from the binary tree representation by removing all )'s and changing (x to {}

  • Anonymous
    April 23, 2010
    On the theme of "generating all of...": printing all anagrams of a given input word is a question I sometimes pose to candidates in interviews. It's amazing how many people get hopelessly tangled up by it.

  • Anonymous
    April 23, 2010
    > It's amazing how many people get hopelessly tangled up This is completely off Eric's topic, but it's interesting to me how common this view of technical difficulty is.  Namely, that a problem that's only mildly complex in general should still only be mildly complex to solve in an interview.  I used to think that, too. Over the years, though, I've had a couple of interviews where I drew blanks on (or got "hopelessly tangled up" in) things I not only should have known, but did know cold in my everyday routine.  And certainly I've been on the interviewing side of of plenty of trainwreck responses to even trivial problems, such as "in the language of your choice, write code to traverse a binary tree".  The Coding Horror blog has recurring posts about even simpler sorts of problems causing difficult.   Two things changed my mind.  One, I've heard that the most widely-held fear in the US is public speaking--outranking spiders, needles, planes, internet outtages, etc.  For many, relatedly I think, the interview is NOT a representative sample of their technical skill.   And two, consider the last time you made an important technical presentation (say, to a client or a boss).  Did you choose to do it without having prepared or rehearsed?  If you rehearsed, why? For many, we rehearse because thinking things through in advance, where you can be meticulous at your own speed, is different from trying to produce meticulous results rapidly, for an audience.  And that's when you know what the questions are likely to be in advance! I'm not saying this sort of question is an unfair test of a candidate, or that speaking and thinking on your feet is not an important skill, even for a "heads down" developer.  But I do think that most interviewers underestimate how technical degree-of-difficulty amplifies when introduced in an interview.  They think it says something about the population, and maybe it does--but I think a significance fraction of the result can be explained by sheer anxiety.

  • Anonymous
    April 23, 2010
    Just a note of pedantry; If there are two trees of size 2 (binary),  A in root with B left, A in root with B right... The you could also say that there are four for a structured tree of size 4:  A in root with B in child position 0 (nodes 1-3 empty),  ... A in root with B in child position 3 (nodes 0-2 empty). When reading your question, it was not clear to me if "any arbitrary tree" meant "a tree with an arbitrary structure" or "any structured tree picked arbitrarily".   So I answered "it depends" :)

  • Anonymous
    April 23, 2010
    "Next time: what else can we generate all of?" Eric: how about generalizing from binary trees to arbitrary directed graphs (including cycles). That way you could generate what amounts to control flow graphs of basic blocks. You might think of it as useful for generating control flow graphs for the testing of the back end of a compiler (or decompiler, in my case). However, the number of graphs is likely to grow very fast with the number of nodes. And attempting to filter out the many isomorphic graphs doesn't work as graph isomorphism is hard (not P, but apparently not NP-complete either). Interesting idea. You're right, it's the isomorphic graphs that are hard to identify. Simply generating all the directed graphs with n nodes is easy; there are 2^(n^2) of them. Imagine an n by n matrix of Booleans that says whether there is an edge from x to y; there is one graph for every possible combination of Booleans in that matrix. Easily enumerated: just make an n x n bit integer and count from zero until you're done. -- Eric

  • Anonymous
    April 23, 2010
    The comment has been removed

  • Anonymous
    April 23, 2010
    > No, that is exactly the opposite of what I need. If A is {} and B is {{}} then I want {{}{{}}} and {{{}}{}} to be different trees. An ordered list gives me exactly the ordering I need. My version does distinguish {{}{{}}} from {{{}}{}}, fwiw, so I don't think it's the opposite of what you need.  It would produce the exact results you require.   But, it is unnecessarily complex. I think I've both introduced a wrinkle and compensated for it, when I should have let well enough alone. Consider two graphs:  A / |
    B C D | E And  A / |
    B D C | E I thought a plausible reading of your original definition would say that in the first case, A's list of children is {B, C, D}, in the second it is {B, D, C}.  These are different lists, but you wish them to describe the same tree.   However, this confers more differentiable identity to the nodes than I should. You've shown trees with labels, but the labels have been based on positions, i.e., labeled after order is fixed.  My instinct is to quibble with this, saying, "Yeah, but what if we reverse the order in which we visit the leafs under A; it's the same 'shape' but with different 'order'."  Upon reflection, I think that's begging the question.  It already assumes that node identity is meaningful apart from position.  That's something I cooked up; you don't distinguish nodes in any way other than their subtrees and their order relative to their siblings.  It's reminiscent of the story about the three baseball umpires (which you've featured).  To paraphrase the oldest umpire, your nodes ain't nothing until you've ordered them.   I've probably spent too much time serializing things lately.

  • Anonymous
    April 23, 2010
    It's interesting that for binary trees the following are considered distinct configurations   A             A                      B            B     /                 C                 C but if they are viewed as general trees they are not distinct, it's the same configuration. There's a distinction in the definitions given for the binary and arbitrary trees, the binary tree definition contains node position information (which also implies ordering), but the definition of general trees only contains ordering information. (without position information) If we were to define the the binary trees without any position information, something like : a binary tree is a tree where every node has a maximum of two child nodes (maybe as a linked list) then the number of possible configurations  is reduced by a large factor. (Intuitively I think 2n!, but I need to verify)

  • Anonymous
    April 26, 2010
    Generating all possible pictures: http://aboutintelligence.blogspot.com/2008/05/generating-all-possible-pictures.html

  • Anonymous
    May 05, 2010
    Eric wrote: "Binary trees have more structure than arbitrary trees! There are two binary trees with two nodes: one with the child on the right of the root, one with the child on the left of the root. There is only one arbitrary tree with two nodes; there is no difference between the "left" and "right" child." I think this argument depends on what function you are using to generate new nodes in the tree. For example, if you have a binary tree, and you are adding a new value, you could create a new node on the left if the incoming value is less than the current node value. That rule is arbitrary, however, because you could reverse that and instead create a new node on the right. Now does that constitute one binary tree, or two? Also, with arbitrary trees, it depends on how you define them. To say there is no difference between the left and the right child really depends on what type of arbitrary tree you are describing. If you have some type of function which determines how new nodes are created, they may in fact have more structure than a binary tree. For instance, if an incoming value is compared to an existing node, you could have it create a left node if the value is less, a right node if the value is greater, and a middle node if the value is equal. (would that make it a tertiary tree?) Anyway, you could scale this "gateway" function so that four, five, or six (or more) nodes are created. I would argue that a six-node tree would have more structure than a binary tree in this case. I haven't done the math on it, but would it possibly have 6! different configurations?

  • Anonymous
    May 05, 2010
    Oops...I meant to write "ternary tree" not "tertiary tree". And...a four-node tree would be a quadtree, etc.