CLR 4.0 advancements in diagnostics

We announced at PDC today that we're making some significant advances in diagnostics tool support for CLR v4!  In particular, we've been investing heavily in improving our support for production diagnostics scenarios over the past couple years.  I'm excited that we're finally able to start talking about it!

Here's a quick list of some of the things we're doing - stay tuned here, Dave's blog and Mike's blog for more details.  Also feel free to ask questions about our strategy and specific plans for new features on our forum (of course we still have a few things we're not ready to talk about yet).  Of course, all of the features below are only available when targetting a process that is running inside version 4 of the CLR.

  1. Managed dump debugging
    Finally you'll be able to open crash dump files in Visual Studio and see managed state (stacks, locals, etc.) without using SOS.  The key scenario we want to enable here is taking a dump of an app/server in production (perhaps in an automated way like Windows Error Reporting) and opening that dump in Visual Studio on another machine at some point in the future.  We have a big piece of this working in the VS 2010 CTP, but we still have some work to do before beta (eg. the CTP supports dumps with full heap memory).  The experience in VS is very much like being stopped at a breakpoint in a live process, except you can't say "go".  Of course production code tends to have JIT-optimizations enabled, so the normal caveats about debugging optimized code apply here too (eg. may not see all locals).  Also, you can't evaluate arbitrary expressions since there is no target process to call functions in (but we have some ideas for how we might compensate for this).  But despite the caveats, this is still a huge feature that should really help improve production diagnostics scenarios.  This work is actually the main visible piece of a much larger "out-of-process debugging" re-architecture we've been working on for years.  This re-arch deserves a post of it's own so stay tuned.

  2. Profiler attach (and detach) for memory diagnostics and sampling
    One of the most common feature requests we hear from profiling tools is to be able to attach to a target process (today you have to set some environment variables at process start which cause your profiler to be loaded).  Before you get too excited - this doesn't have everything you want.  In particular, the CLR still doesn't have the ability to change the code of a method once it's been JIT-compiled (EnC is a very special case - not really applicable here).  This means that IL instrumentation isn't available on attach, as well as a few other features (like object allocated callbacks).  But basic memory diagnostics scenarios where the profiler inspects the heap, and simple sampling-based CPU profiling will now work on attach.  We anticipate this will be useful in production scenarios - you can walk up to a server behaving badly and attach a profiler, collect some data, and detach - leaving the process in basically the same state it was before you attached.
    [Update: See Dave's blog entry here for additional details on profiling API improvements]

  3. Registry-free profiler activation
    One major impediment to the sort of production scenario I described above is that today you have to register your profiler in the registry.  In many production scenarios, making some change to the machine-wide system registry is very unappealing (will the dev remember to undo the change when he's done with the server, etc?).  So to really enable production scenarios, we've also supplied a mechanism for running a process under a managed profiler (or attaching) without having to make any changes to the registry.

  4. x64 mixed-mode debugging
    This isn't really a production diagnostics scenario (although you can do x64 mixed-mode dump debugging), but is one of the main debugging feature requests we've gotten.  With this feature, "mixed-mode" (native+managed) debugging will work for x64 processes in basically the same way it works for x86 today. 

  5. lock inspection
    We're adding some simple APIs to ICorDebug which allow you to explore managed locks (Monitors).  For example, if a thread is blocked waiting for a lock, you can find what other thread is currently holding the lock (and if there is a time-out).

  6. Corrupted-state exceptions
    This feature doesn't come from our team (it's part of the core exception-handling sub-system in the CLR), but in my opinion it's a huge improvement for diagnostics scenarios.  Basically it means that "bad" exceptions (like access violations) that propagate up the stack into manage code no longer (by default) get converted into normal .NET exceptions (System.AccessViolation) which you can accidentally catch in a "catch(Exception)" clause.  Basically, haivng a catch(Exception) which swallows AVs coming from native code is a bad thing because you're unlikely to be able to reason about the consistency of your process after the AV.  The default behavior for such "corrupted-state exceptions" is now to fail-fast and send an error-report (just like in normal C++ programming).  Of course, you can override this if you REALLY need to catch such an exception.

That's the overview of the main CLR v4 features that affect diagnostics.  Of course there are also lots of other great things coming in CLR v4 and the rest of .NET Fx 4.0.

Comments

  • Anonymous
    November 01, 2008
    Am I missing something? The current version of the CLR is version 2.0 (for .NET 2.0, 3.0, 3.5 and 3.5 SP1). Are you skipping version number 3 for the CLR. Can you remember the old adventure game Leisure Suit Larry? They skipped version 4 because they lost the floppies. Did the CLR team loose the floppies for CLR 3 or is this yet another strange branding thing like happened with .NET 3.0 and up?

  • Anonymous
    November 03, 2008
    Great question Steven.  No we did not lose the code for CLR v3 <grin>, we just want to try to avoid confusion by keeping the CLR and .NET version numbers in sync as much as possible.  So yes we skipped CLR v3 so that .NET 4.0 would contain CLR v4.

  • Anonymous
    November 05, 2008
    I'm glad you are trying to sync the numbers again, but I never understood why Microsoft got itself into this mess anyway. I must say I was quite frustrated about the whole branding scheme of .NET 3.0. The argument was that .NET 3.0 was a significant improvement over .NET 2.0 (Microsoft added WinFX), but I found this a pretty bad argument compared to the confusion it caused. By naming the next CLR ‘4.0’, you are actually agreeing with this statement. Please don’t listen to those marketing guys again, ever!

  • Anonymous
    November 10, 2008
    Now that we've finally announced at PDC many of the new features coming up in the next major release

  • Anonymous
    December 01, 2008
    Excellent posting, thank you for all the information. I was curious on the managed dump debugging. Is the plan to eventually support this on some form of mini dump (perhaps a new flag in MiniDumpWriteDump to collect the managed information?) Your post made it sound like the CTP requires HeapMemory in the minidumps (MiniDumpWithFullMemory?) It sounds like the plan for release is to support this with a much smaller minidump package though, is that correct? (Beers are on me if it does...)

  • Anonymous
    December 01, 2008
    Fatkenny - You cought me, I was intentionally being vague on the point of minidump support (we weren't done yet).   Yes, the CTP requires MiniDumpWithFullMemory.  Minidump generation has actually worked since CLR 2.0, you just have to use SOS to consume them (and they're limited really to stack traces, dumping active exception objects, and listing threads) - see http://msdn.microsoft.com/en-us/magazine/cc163833.aspx for details. I'm happy to say that we just recently finished adding support for consuming managed minidumps through ICorDebug to the CLR v4 codebase.  So things that worked with SOS in CLR v2 on minidumps will now work through ICorDebug in CLR V4.  Minidump generation hasn't changed a whole lot, but being able to open them in VS with the UI you're used to (and so also access them programatically with ICorDebug) is a nice bonus. Glad you're interested in this.  I'd be curious to know how you plan to use it.  Eg., do you use Windows Error Reporting today?

  • Anonymous
    December 02, 2008
    Right now we generate our own minidumps at various points in our system (for a variety of reasons. Sometimes this is for unexpected situations like exceptions crossing a boundary, sometimes in attempts to analyze why something did not respond in a reasonable time, etc.) Most of our installations sit on machines which can not access the internet (or even most of the respective corporate intranets.) We tend to have the ability to connect to machines at certain times for support though. At that point we "harvest" whatever minidumps we may have generated. Unfortunately our connection to the client machines run the gamut as far as bandwidth is concerned, so the small svelte nature of non-full memory minidumps is great for us. It is my very laynerd understanding of WER that makes me think it would not be useful in our current situation (we are in a very vertical market.) We have found the minidumps to be INCREDIBLY useful (so useful that I am compelled to use all caps :)) The addition of the ability to be able to see the managed stack in visual studio will be an aweseom thing for us. (Meaning without using SOS, which we have done some of in house, but we don't do it enough to make it as second nature as normal Visual Studio post mortem unmanaged dump debugging.) Just to clarify though, is the plan to support debugging managed dumps (getting stack traces, exception info, threads, etc) via the smaller size (re: non-full memory) mini dumps at release time? I realize that the need to ship can make this sort of feature not happen, but it sounded like that was y'alls plan (which again, would make me a happy nerd.)

  • Anonymous
    December 02, 2008
    Yes, the plan is to support basic debugging of minidumps (without MiniDumpWithFullMemory or even MiniDumpWithPrivateReadWriteMemory) in Visual Studio 2010.  Sounds like your scenario is a pefect consumer of this.  Be sure to try out Beta1 when it becomes available and let us know how it works for you!

  • Anonymous
    December 13, 2008
    I am also very interested in the (Mini-)Dump support! We have almost the same scenario like fatkenny. We have shipped around 4000 apps (software for controlling industrical machines) which are only connected to our service team, if they hove problem. And in this case it is very important to clearly indentify the problem. And minidumps were the best we could think of. Currently I a missing source-support in sos.dll/WinDbg. But to my point: I tried it in the VS2010CTP-version and it "worked" with full memory, but it does not show the "correct" callstack! It showed the callstack which wrote the minidump ;) So you need to switch to the exception-callstack...

  • Anonymous
    December 13, 2008
    Thanks Kajo0011, I'm glad this will help your scenarios.   By switch to the exception callstack, you mean the thread that threw the exception?  Yes, I believe that's a known bug, it should start on the crashing thread by default.  We'll try to get this fixed for beta1 (no promised though of course - there are always surprises <grin>).

  • Anonymous
    December 14, 2008
    Thanks rmbyers! Just a small note: The thread is the correct one.... but the callstack is the wrong one. The callstacks displays the state of writing the minidump; but it should display the state of the exception (in WinDbg this is the ".ecxr" command). For more details and a repro step see Connect: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=387985 If you do not switch to the "exception callstack" then the minidump-support is almost useless... Thanks for your help!

  • Anonymous
    December 15, 2008
    Ah, I see what you're saying now.  The main problem is that you're generating your dump on the second-pass of exception-hanlding (in a catch block), which means the stack that threw the exception has already been unwound (see http://blogs.msdn.com/cbrumme/archive/2003/10/01/51524.aspx).  Even .ecxr wouldn't work here - the context is gone (stack has been at least partially overwritten). If you're generating a dump in response to an exceptiom, you REALLY want to generate the dump before the stack is unwound (since it's that stack that has most of the information on the cause of the exception).  This is actually a problem in native C++ too, but managed code does make it more painful. To generate a dump before the stack is unwound, you probably want to use an "exception filter".  Unfortunately, C# has no syntax for this, so you have to use VB or IL to write it.  I've had this topic on my "to blog" list for awhle - I just bumped it to the top of the list - expect something soon... The built-in error-reporting support in the CLR does this - invoke error reporting (and hence dump generation) before the stack is unwound.  But there is still a fundamental problem with catching and rethrowing exceptions (eg. reflection catches all exceptions and wraps them in a TargetInvocationException). In addition to this main major issue, there may also be some smaller things we can do to make the experience better.  Eg. even when generating a dump on the first-pass, there is still the issue of which context should be presented to the user - the current thread context, or the exception context (.ecxr you mentioned).  Ideally VS will automatically switch to the exception context.  I'm not sure if it does that today.

  • Anonymous
    December 18, 2008
    Hi  rmbyers! In my case .ecxr works perfectly!!! Try it with my sample! And use WinDbg 6.7.5.0. It immediately displays the correct stack trace! Because I use the expParam to pass the correct stack to MiniDumpWriteDump! Please use my example and you will see that the minidump is correctly written and WinDbg does correctly show the callstack. Only VS2010CTP does not! http://blog.kalmbachnet.de/files/MiniDumpTest1.zip There is no problem! Neither with unmanaged nor with managed code! Please try it! The problem is only that VS does not read the correct context from the minidump! And today VS2008 correctly switches to the .ecxr contect for unmanaged code! In VS2010CTP the new managed-minidump dupport does this not! You minidump-support is currently useless.

  • Anonymous
    December 22, 2008
    Hi, Yes my explanation was overly simple (the stack isn't really "unwound" just made unavailable).   I've written this up in detail in a new blog entry here: http://blogs.msdn.com/rmbyers/archive/2008/12/22/getting-good-dumps-when-an-exception-is-thrown.aspx.  Pay special attention to the second-last paragraph - that's especially for you. The short answer is that VS doesn't support .cxr for native or managed (for both live and dump debugging), but even if it did there would be problems with managed code due to the GC (basically a CLR limitation).

  • Anonymous
    December 22, 2008
    By the way, thanks for all your questions and sample code on this.  It's helped to convince me we should try harder to do more here.  I'm talking with the CLR exceptions team about some things we might do in this space (and also improve the experience when debugging exceptions that are rethrown), but don't expect anything in CLR v4 (we're locking down to ship it now). Thanks!   Rick

  • Anonymous
    December 22, 2008
    Thanks for your answer... I hope that somtime in the future the CLR team will support "full-blown-diagnostic-support" ;)