Hardening Server Applications [Immo]

From time to time a company ships a product that has a huge impact on their ecosystem. A good example for us is certainly .NET. The biggest value proposition that managed code has is that it is, well, managed code. The CLR provides runtime management components such as a garbage collector or reflection that are aimed at reducing the likelihood of bugs and at increasing the developer's productivity. These features allow developers to focus on building their applications instead of tweaking and massaging mechanics. After all, the hope is that by improving the software that is used for developing other software we improve software on a broader scale (how meta!).

Today we are proud to make an announcement that potentially marks a milestone similar to the PDC 2000 announcement of .NET. Several teams at Microsoft worked during the last years on an upcoming product – code named "Source Code".

The Problem

We all have seen it: a customer reports a bug and after debugging it for a while we realize that the bug fix involves changing a single line. Sometimes, the fix only involves fixing a single character, such as the famous off-by-one error where one only needs to replace "<=" by "<".

Quite frequently, small errors can have devastating effects. For example, an Ariane 5 launch vehicle had to self-destruct due to a single casting error where a 64 bit floating point value was converted to a 16 bit integer. This caused the flight computer to make wrong adjustments due an arithmetic overflow.

How "Source Code" works

Research has investigated several strategies to enable computers to learn. One fruitful approach is genetic programming.

In artificial intelligence, Genetic Programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task.

---Wikipedia

The basic idea is that by mutating existing code and applying a selection-function one can automatically find a computer program that solves a given problem.

During the last years the BCL team, the wider CLR team and Visual Studio worked together with Microsoft Research to build a product around this idea. The result, code named "Source Code", is a combination of new technologies with evolved versions of existing technologies such as IntelliTrace, software transactional memory (STM) and Pex. The basic idea is simple. Whenever the CLR discovers an unhandled exception it rolls back the state of the applicationto a point in time prior to the crash (the default is 8 minutes but can be configured). To do this, the CLR uses a full IntelliTrace recording so that the whole runtime state, including the heap, can be properly restored. Then, the stack trace of the unhandled exception is analyzed to detect which method is the most likely culprit. This information is passed to Pex in order to create a permutated version of the method body ("mutation"). This enables new code paths not previously explored in the application. After that, the CLR resumes execution. If the application crashes again, it repeats the above process until a code path is found that avoids the error ("selection"). Over time, successful code changes persist improving the overall fitness of the application ("survival of the fittest").

M5 Multitronic System from the Star Trek episode 'The Ultimate Computer'

Picture of the M5 computer in the Star Trek episode "The Ultimate Computer".

As a result of the above process as application will automatically correct itself over time! The brilliant Dr. Richard Daystrom, the designer of the "M5 Multitronic System" (pictured above) would be very pleased to see this advancement.

Currently the technology is in an early prototype state and we are working on removing some restrictions:

  • Because of the involved downtimes during rollback and mutation, this scenario will only be available to Windows Azure based ASP.NET applications. For the future, we plan to extend the support to client applications as well.
  • "Source Code" will only save a limited number of applications. We have seen cases where partially working applications are displaced from the cache by more buggy applications. Our next goal is to improve the cache policy to avoid these situations.
  • The early CTP does not include support for management and monitoring but we are actively working to get "Source Code" integrated with the Microsoft System Center products.

You can download a CTP here.

Comments

  • Anonymous
    March 31, 2011
    I have to commend the BCL team -- this could very well be the single most revolutionary advancement in computing to come out since the last vernal equinox. Bravo men and women. Bravo. The only thing I'm deeply concerned about is your mentioning of how we will be able to configure the roll back destination point in time to anything we want. Is this a wise thing to do? My own research tells me that adjusting the destination point in time anywhere beyond 8 minutes could very well end up creating a large energy-burst powerful enough to disrupt the fabric of space-time and shift the dimensions themselves. I think we all know what happens then. ... Seriously though, I started to become very excited while reading this...until I realized what day it was... Matt Weber http://omniscientist.net

  • Anonymous
    March 31, 2011
    It would be cool if Intellitrace could capture the space-time continuum so that we can rollback and kick the dev who introduced the bug in the first place. ;)

  • Anonymous
    April 01, 2011
    Clearly there's more to the story than what you've said here.  Obviously without a mechanism to formalize and express intent, simply mutating the code until it simply no longer throws an exception would be at best naive and at worst highly dangerous.  I'm looking forward to understanding more about this approach.

  • Anonymous
    April 01, 2011
    2 Praseeth: I will use that time-travelling Intellitrace to go to the day I ran my car into the tree and give a kick to myself before it happend. :)

  • Anonymous
    April 01, 2011
    The comment has been removed

  • Anonymous
    April 01, 2011
    In my defense, I never actually followed the link so at least I pieced it together myself.  Corrupted State Exceptions upon all of you! Cheers, -Brian

  • Anonymous
    April 01, 2011
    "Hmm, intriguing idea!" ...1 minute... "Wait, how could that possibly work?" ...1 minute... "What day is it again?" :-) Igor Ostrovsky http://igoro.com/

  • Anonymous
    April 01, 2011
    It sounds cool! I hope the next version of the Hardening Server Applications will be able to recognize a programmer who could made a bug and to fix his/her DNA.

  • Anonymous
    April 01, 2011
    The comment has been removed

  • Anonymous
    April 01, 2011
    Haha nice approach :)

  • Anonymous
    April 02, 2011
    Great work!  Another step closer to catching up with 1970's Mainframe technology. ;)

  • Anonymous
    April 12, 2011
    The comment has been removed