A fate worse than death – well, the death of your process

One question that we are sometimes asked is "Why couldn’t your runtime recover from condition X". The answer is fairly interesting and it is not limited to our runtimes. Let’s look at a few of the possible problems

1. Out of memory exceptions. Most commonly, the question is asked about .NET code. When you get the exception, there is generally some space left in the process. The question has to be why the GC doesn’t recover some memory to stop this happening. Actual, it is normally doing this that tips the process over the edge. The first thing that you will need if you are going to do a GC is a big contiguous lump of memory. What you generally don’t have in low memory situations is a big contiguous lump of memory. The GC will work very hard to keep you running right up until the moment when it can’t run.

2. Corrupted heap. You can trap AVs so why can’t you continue? The answer is that you could, just so long as you don’t much care what your application is doing or what is happening to your databases and whatnot. If the heap is corrupt then the only thing that you know for sure is that your process state is unknown. Running a program that does apparently random things is bad.

3. Corrupted stack. Uh, sure. Where were we and what was our data? We no longer know. We might not know where our exception handler is any more. There is no sensible way to handle this error.

4. Deadly embrace. Well, there are cases where this can be solved by killing off one of the contenders, mainly when we talking about database lock contention. What happens if we try the same thing with threads? That is not so good. We don’t know what resources the thread was holding. If we kill the thread and award the mutex (or other synchronisation object) to the winner then the process is in an undefined state. We will soon crash. The only safe option is to kill the process.

How bad is it if the application dies? Well, that depends on the application. If you are controlling the avionics of a fighter plane, it is a disaster. If you are a stateless server, people might not notice because you will be brought right back up again, In most commercial applications where supersonic manoeuvres are not a feature, crashing out is much less damaging that corrupting a database. There are fates worse that death.

Can you ever end up running something that isn’t code and if so, how fatal should it be considered? Well, yes, you can. If you have a call in to a DLL that has unloaded then you will try to run whatever happens to be in memory. Typically, this will happen when you call release on a component that has been reference counted incorrectly. If this happens, then one of three possibilities exist. The first is that the code is still in memory because we haven’t cleaned up yet and the code runs, even if it doesn’t do quite what it should. The second is that the address is no longer valid in the process and we AV on the call. The third is the most interesting. We run whatever bytes are there. As long as there is execute permission on the page, Windows will not stop us. When this happens, normally there is a crash pretty quickly. This will generally be caught and handled though a lot of code will actually ignore the exception because there is no obvious "good" thing to do. It is possible that the code will not actually crash but will do apparently random things instead. If that happens, your chances of debugging it will be slim at best.

By the way, if there are any other topics that you would like me to meander on about, please let me know.

Signing off

Mark

Comments