Why have a stack?

Last time I discussed why it is that we have all the .NET compilers target an "intermediate language", or "IL", and then have jitters that translate IL to machine code: because doing so ultimately reduces the costs of building a multi-language, multi-hardware platform. Today I want to talk a bit about why IL is the way it is; specifically, why is it a "stack machine"?

To begin with, what is a "stack machine"? Let's consider how you might design a machine language that could describe the operation of adding together two integers to make a third. You could do it like this:

add [address of first addend], [address of second addend], [address of sum]

When the machine encounters this instruction it looks up the values of the addends stored in the two addresses, somehow adds them together -- how it does so is its business -- and stores the result in the third address.

You might instead say that there is a special region of memory called the "accumulator" which knows how to add a given value to itself:

clear_accumulator
increase_accumulator [address of first addend]
increase_accumulator [address of second addend]
save_accumulator [address of sum]

Or, you could say that there is a special region of memory called the "stack" which can grow and shrink; you get access to the items on the top:

push_value_at [address of first addend]
push_value_at [address of second addend]
add
pop_value_into [address of sum]

The "add" instruction takes the two values off the top of the stack, somehow adds them, and then puts the result back on the stack; the net result is that the stack shrinks by one.

A virtual machine where most of the instructions are of this third form is called a "stack machine", for obvious reasons.

IL specifies a stack machine, just like many other virtual machines. But most hardware instruction sets actually more closely resemble the second form: registers are just fancy accumulators. Why then are so many virtual machines specifying stack machines?

There are several reasons, but again, it primarily comes down to lowering costs. Stack machines are very easy to understand, they are very easy to write a compiler front-end for, they are easy to build interpreters for, they are easy to build jitters for, and they provide a concise abstraction for the common case. The common case is that the result of any particular operation is only going to be of interest for a brief period.

Imagine, for example, if we chose the first strategy for IL, and then had to compile an expression like x = A() + B() + C(); What would we have to do in the first case? Something like this:

create_temporary_storage // for result of A()
call A(), [address of temporary storage]
create_temporary_storage // for result of B()
call B(), [address of temporary storage]
create_temporary_storage // for result of first addition
add [address of first temporary storage], [address of second temporary storage], [address of third temporary storage]
...

You see how this goes? The IL is getting huge, and all so that we can keep track of precisely which memory locations are used to store values that we are about to never care about again. A stack abstraction lets the stack implementation deal with the temporary storages; in a stack machine, the same code looks something like:

push [address of x]
call A() // implicitly creates a temporary storage by pushing the result on the stack
call B()
add
call C()
add
store // store result on top of stack in address just below it.

The code is much smaller and much easier to understand. A stack machine is a very simple way to describe a complex computation; by being able to write code for such a simple machine, it lowers the cost of making a compiler. And not only is it easier to write compilers and jitters that target simple stack machines, it is easier to write other code analysis tools as well. The IL verifier, for example, can quickly determine when there is a code path through a method that, say, misaligns the stack, or passes the wrong types of arguments to a method.

Comments

  • Anonymous
    November 28, 2011
    It's simple, it's well understood, and it works. Three wonderful, enviable traits for any system to have.

  • Anonymous
    November 28, 2011
    This articles demonstrates the pros of using a stack which is simplicity but what about the cons? what are we giving away by using a stack machine? Good question; just because the "pros" are enormous does not mean that there are no "cons". The CIL system avoids many of the "cons" because (1) the stack machine is just an intermediate form on the way to being realized as machine code, and (2) the CIL system is not a "pure" stack machine; you can get access to "local" items that are not at the top of the stack easily enough. These are effectively arbitrarily many registers, if you want to, say, perform common subexpression elimination. For a good discussion of some of the problems that stack machines have, see http://en.wikipedia.org/wiki/Stack_machine for details. -- Eric

  • Anonymous
    November 28, 2011
    The comment has been removed

  • Anonymous
    November 28, 2011
    The comment has been removed

  • Anonymous
    November 28, 2011
    But still, the stack-based code does not have to specify all the data sources and destinations (be it memory addresses or registers).

  • Anonymous
    November 28, 2011
    @Joren: no, but in more complex scenarios it sometimes has to do some bit of copying around or shuffling the stack ordering because the data sources and destinations are fixed and implicit... You always gain some, loose some. That said I understand stack-machine are very easy to work with, but I would also tend to think that SSA are not that bad either and lend themselves pretty well to various analysis and optimizations. Anyway we're stuck with the IL the way it is and it's surely not gonna change soon.

  • Anonymous
    November 28, 2011
    The comment has been removed

  • Anonymous
    November 28, 2011
    The comment has been removed

  • Anonymous
    November 28, 2011
    The nice thing about stack VMs is how easy it is to walk the expression tree and generate bytecode for it - you don't need any information shared about the nodes, every one of them can generate its own code independently without any assumptions about what the stack looks like before that, so long as it ends with the newly computed value on top.

  • Anonymous
    November 28, 2011
    Mike Caron: The SSA "registers" (%1, %2, %3 in the example) are all virtual -- they don't "live" anywhere. The JIT compiler's job is to figure out where (if anywhere) each register is stored at any point in a function. The values might be stored in a CPU register, on the stack, or not stored at all. For example, %3 might be stored in a register when it's first computed, then moved to the stack to use it in a function call, and then discarded completely once it's unneeded. Once the lifetime of a register has expired or been moved the location that stored its value may be reused. Also, I should add that stack-based machines are not uncommon. The x87 is a stack architecture that probably everybody reading this post has an implementation of in some way or another. The Burroughs B5000, like the CLR, has a typed, stack architecture.

  • Anonymous
    November 28, 2011
    Woah, Eric, I didn't mean to attack or insult you. I'm sorry it came out that way. I just think you picked a bad example. But I also explained why I think your example was bad. I notice you didn't address my technical argument.

  • Anonymous
    November 28, 2011
    @Mike.Caron: read up on LLVM. %1 etc. are pseudo-registers (most register VMs provide an infinite number of logical registers for convenience). But they can be viewed in another way: as names for nodes in an expression DAG. It's from this that simpler analyses flow.

  • Anonymous
    November 28, 2011
    Eric, FWIW, I actually believe the CLR chose a stack machine because the JVM chose a stack machine, and the JVM chose a stack machine because it was a reasonably compact representation when kept uncompressed in memory, and that in turn was important because it was designed to be interpreted on small devices (settop boxes and the like). But the CLR was from the get-go designed to compile its code, rather than interpret it in any circumstances. I think the CLR would actually have been slightly better off with an SSA design. But there's little to choose between them; it's not difficult to turn one into the other. From what I understand, most non-trivial JIT compilers convert the stack code into SSA anyway.

  • Anonymous
    November 28, 2011
    The JVM (and thus the CLR) stack machines look very much like the Smalltalk bytecode stack machines with types strapped on. The Smalltalk machine, was, I think, originally microcoded on the Alto. At around the same time the AT&T folks including, I think, Rob Pike, worked in the Inferno/Limbo system that had a 3-address IL.  They certainly argued it was quicker and easier to write a (simple, not-very-optimising) translator from 3-address code to machine code than it would have been with a stack machine. a

  • Anonymous
    November 28, 2011
    To add balance, Microsoft also ship register-based virtual machines. So even within mscorp there is no consensus on the best approach. Some history, perhaps: Microsoft acquired a company in 1996 whose product was a register-based virtual machine. There was lots of speculation at the time that this was the basis for MSIL, but that speculation was wrong. Fast forward 15 years and the person who created that virtual machine also designed Microsoft's newest register-based virtual machine - namely the Chakra JavaScript engine: > "IE9 includes a new interpreter which uses a register-based layout, efficient opcode, and use of type optimizations." That company was Colusa and the person is Steve Lucco www.microsoft.com/.../default.mspx

  • Anonymous
    November 28, 2011
    I have always asked myself exactly this question. Why have a stack? I would like to know why building an (optimizing) jitter is easier for a stack machine although the jitter target platform uses assignments?

  • Anonymous
    November 29, 2011
    The comment has been removed

  • Anonymous
    November 29, 2011
    Your "stack machine" example reminds me of Reverse Polish Notation calculators: 7 * (3 + 5)  =  3 5 + 7 *

  • Anonymous
    November 29, 2011
    @Paul. Well that is exactly how you encode a calculation to be executed by a stack machine. I would say, rather sleepily, that they are equivalent.

  • Anonymous
    November 30, 2011
    Paul's comment about reverse Polish is spot on. People keep thinking that computers were invented around 1986, but mainframe computers with stack architecture based on reverse Polish were around in the 1960s. I worked on one for many years, and the instruction set and opportunuties for optimization make them a good choice. en.wikipedia.org/.../Stack_machine

  • Anonymous
    December 01, 2011
    @Paul Whoever thinks computers were invented in 1986 must not have been paying any attention what-so-ever. Even if we assume they were oblivious to the existence and development of "full size" computers, they should still know better. One of the earlier microcomputers, the Altair 8800 came out in 1975.

  • Anonymous
    December 03, 2011
    @Mike A very long time ago (at the time, I was still in secondary school), I wrote a simple expression compiler that targeted a stack machine; and to make the RPN "program" faster, I implemented a peephole optimizer. For patterns that looked like 'push a; push b; add' where a and b are constants, I would replace it with 'push c', where c = a+b. I did the same for sub, mul, div etc. And I ran this trivial optimizer iteratively over the instruction stream until it no longer made any replacements. It turns out that if you do this, it amounts to constant subexpression evaluation for expression subtrees that are wholly constant (i.e. it can resolve 1 - 2 - x to -1 - x, but can't do the same for 1 - x - 2). So there are some meaningful optimizations that you can do with stack representations, even fairly cheaply.

  • Anonymous
    December 04, 2011
    @Jsmith I think "not paying attention what-so-ever" is a bit harsh.  From what I remember of my youth, it was awfully easy to dismiss anything that was invented before I was born as not terribly relevant.  I suspect the same is true for many young people today :-).

  • Anonymous
    December 10, 2011
    Re "a tiny bit disingenuous". Do people realise that 'disingenuous' means: "The opposite of ingenuous; lacking in candour or frankness, insincere, morally fraudulent. (Said of persons and their actions.)" [OED]?

  • Anonymous
    December 14, 2011
    I'm a tad disappointed in humanity that this somehow turned into a religious battle. Thanks for the post Eric.