Reaching Out to 'C' Developers (How I Learned to Love Garbage)

Growing up in Canada made me aware of the importance of bilingualism. Indeed, I can speak both 'C' and C#, although my native tongue is ‘C’, and C++/COM still seem to me like slightly odd dialects.

When I started on the .NET Compact Framework project, I had to get comfortable with the concept of “object reachability”, something that Visual Basic (and many other language) developers took for granted. I had spent so much mental energy on the problems of making sure that my pointers were valid and that I always knew the length of the things they pointed to (and those surprising little spaces between things that compilers liked to insert), the notion that there was some kind of magic that made this all irrelevant bothered me. Had I missed something obvious for all those years? Could this thing, the garbage collector, really work? Isn’t it just ref counting for me? Can it read my mind? Scary.

The reachability model in ‘C’ is pretty straightforward. Everything is reachable. All the time. Even after the very thing you are reaching for is no longer there, you can still reach it. You can reach for stack variables during their lifetime, and then later on too. Sometimes you can reach into the stack frame itself from a computer on the network. You can reach the data structures that you allocated on the heap, and also the ones allocated by developers in building 118, and also building 27. And you can change them too. You can reach the beginning of an array, the middle, the end, and just beyond the end.

The power can go to your head. And make you work late. Heap fragmentation is a CS student’s dream. You can spend countless hours devising clever heap management algorithms to make your apps run fast. Frankly we have more than a few clever heap management algorithms here at Microsoft. Too bad you couldn’t just move the blocks around like a game of Tetris. But you’re always fighting the power of ‘C’ reachability.

Garbage collection as first described to me: "You allocate things but you don’t free them. The GC occasionally takes a look around for things that are no longer in use, and then frees them." Wow. I’m imagining the GC scanning my heap and trying to decide if I’m using something or might want to use it again, without me giving it a clue. I didn't like it at all.

The first breakthrough in my understanding (after many hours with smart folks on my team drawing circles and arrows for me on whiteboards), came from the info that these blocks of memory are not really floating around unsupervised. At any time, the GC can find all blocks of memory that it has allocated (it keeps a list). I was feeling a bit better.

The problem now comes down to one of figuring out what is no longer used (“unreachable”) and freeing it.

Typesafety is the building block to making this an approachable problem. Typesafety is the mechanism that makes this illegal:

pSomeThing = (SomeThing *) 42;

Once the system is typesafe, you can deterministically find all references (pointers) to objects and follow these references to find all the reachable objects. The GC then subtracts the set of reachable objects from the set of all objects and releases the unreachable objects. Why didn’t I think of that? Actually, the problem of finding all the reachable objects is rather complex.

The trick is to freeze program execution, locate all the "roots" and follow all the trees of references to objects to references of objects recursively. The roots consist of all global references, and all references that are allocated on the stack frames (method calls) of every thread (and maybe machine registers in highly optimized code). If you think about this long enough, it starts to make sense, and it’s darn clever, but it’s a rather backward perspective when you are actually writing code, and you want to know when you are making something unreachable – the mysterious delayed uncalled call to free().

Note that nothing is keeping an ongoing record of which objects are reachable or unreachable. It turns out that GC allocators are very fast, since they just hand out blocks without concern for fragmentation. And the GC really adds no overhead to the system while running. Most of the overhead in garbage collectors is in the collection phase. It is for this reason that the system tries to defer collections for as long as possible. It's also noteworthy that the time to run a collection, assume no compaction, is related to the number of reachable, not unreachable objects.

The creation of lots of garbage is not free, however, as the memory pressure on the process will trigger GC collections, and in extreme cases, expensive GC heap compaction.

So here’s the ways that things becomes unreachable. First, let’s look at the storage behind the fields (variables) in a class.

Fields which are declared inside a class and inside a method (intLocal) are called locals, and they are allocated on the thread stack, exactly as they would be in ‘C’. Their lifetime is exactly the lifetime of the method call, and they are released by unwinding the stack.

Non static fields that are declared outside a method are called instance fields (intInstance). The storage for these fields is provided by the GC. The instantiation of a class (via the "new" operator) creates an object, with storage for all instance fields. Instantiating an object does not allocate any storage for locals.

You can think of the set of instance fields in a class roughly the same as struct that you manage on a heap in 'C', and locals as, well, locals.

The slide is somewhat of an oversimplification of the .NET Compact Framework implementation. We actually have very clever heap management algorithms behind the GC heap that are capable of compacting heaps and releasing memory from the process back to the operating system.

Static fields are a special case that I’m going to ignore for now.

Let’s pull up the latest research tool we use here at Microsoft, the Powerpoint Integrated Development Environment, put it in graphical debug mode and set some breakpoints.

A single thread is running on this system. A local variable, called LocalVar1 was allocated on the stack when the method 1stMethod() (inside some class we can ignore) was called. After a call to new to instantiate an object of type Class1, LocalVar1 is a reference to an object and the object is reachable. Simple stuff.

Now we step forward a few lines until after 1stMethod() has returned. The reference variable on the stack is gone, and our object is orphaned. We have just created our first unreachable object.

The next scenario starts out like the previous, with LocalVar1 referencing an object of type Class1.

Another statement is executed, which instantiates another object and assigns it to the very same local variable, LocalVar1.

The previous object that LocalVar1 pointed to is now unreachable, even before the method returns.

Of course, you can have a tree of reachable objects, of arbitrary complexity, that all become unreachable when the root of the tree becomes unreachable. In this case, LocalVar1 references an object of type Class1. After instantiating that object, a method was called in it that instantiated another object of type Class2 and assigned it to the instance field (not on the stack), InstanceVar1, in the first object.

One statement can make lots of things unreachable.

Up to this point, you can imagine an implementation did lots of ref counting, and ran lots of expensive and complex searches on every assignment and stack unwind might work. It would be ReallySlow™, but it might work.

It gets more complicated.

The last scenario starts like the first scenario. A local variable in 1stMethod() is assigned to an object of type Class1 that has been created with a new statement. Next, the value of that local is passed as a parameter to 2ndMethod(), which is then assigned to another local, LocalVar2. Now, 2 local variables, in two methods, each point to the same object.

Even when 2ndMethod() unwinds and LocalVar2 is released, the object of type Class1 is still reachable. Simply assuming that the destruction of a local variable on the stack would render the object it points to would be an error. In fact, it is necessary to run a complete search of all trees to determine this.

 

Stuff to think about.

 

Mike.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Comments