Toolbar Compatibility Debugging Walkthrough
In the past I’ve found debugging walkthroughs useful for picking up new techniques. In that spirit here’s a quick rundown of a bug I was investigating today that may have some useful tidbits.
This was a crash in IE that involved a toolbar that I didn’t have the source code for. The issue was that if you clicked in the toolbar’s edit box and later closed the browser, IE would crash.
It was crashing while trying to call Release() on a pointer to the toolbar and initially looked like a reference counting issue, either in IE or the toolbar code itself. This type of bug can be tricky to track down in your own code, so given that this bug straddles legacy IE code and external toolbar code I closed my door and prepared for the worst. :-)
I started out by turning on full pageheap using gflags.exe, which is part of the standard debugging package, and repro’d the bug. This was to ensure that the crash wasn't a side-effect of heap corruption, and that I was debugging the right thing.
Next I put a breakpoint on the toolbar’s Release(). Since I don't have the source I had to track this down manually:
0:005> kP 1
ChildEBP RetAddr
01eaeab0 0074f7fb xxxxx!xxxx::_xxxxxxx(
struct IUnknown * ptb = 0x020f3940)
0:005> dds 0x020f3940
020f3940 10031b44 toolbar!DllMain+0x27d24
020f3944 10031b2c toolbar!DllMain+0x27d0c
020f3948 10031b18 toolbar!DllMain+0x27cf8
020f394c 10031af8 toolbar!DllMain+0x27cd8
020f3950 10031ad8 toolbar!DllMain+0x27cb8
020f3954 10031f50 toolbar!DllMain+0x28130
020f3958 00000003
[...]
0:005> dds 10031b44
10031b44 1000cc90 toolbar!DllMain+0x2e70
10031b48 1000cdd0 toolbar!DllMain+0x2fb0
10031b4c 1000cdf0 toolbar!DllMain+0x2fd0
10031b50 1000ce20 toolbar!DllMain+0x3000
[...]
I could have also unassembled the code and traced the logic, but I've found that it's often faster to just use "dds" to dump interesting-looking addresses. "dds" is especially useful for dumping the stack when symbols are incomplete (or the stack is corrupt) and for tracking down objects on on optimized builds where the debugger gets confused. (When you have symbols and dump an address it will be immediately obvious from the vtable whether you're looking at the right object.)
The IUnknown interface has three methods: QueryInterface(), AddRef(), and Release(), in that order. Given the dump of the vtable I assumed toolbar!DllMain+0x2fd0 was the Release() function and confirmed by unassembling it. It looked right, so I put a breakpoint on just before the return:
0:005> u toolbar!DllMain+0x2fd0
[...]
1000ce15 8b06 mov eax,[esi]
1000ce17 5e pop esi
1000ce18 c20400 ret 0x4
1000ce1b cc int 3
0:005> bp 1000ce18
and then re-ran the repro. For brevity I’ve left out many of the calls and removed redundant output. ‘eax’ holds the return value of Release() so you can see that it’s winding down to the point of doing the final Release() (at which point the object will delete itself).
0:005> g
Breakpoint 1 hit
eax=00000004 ebx=020f39ec ecx=020f395c edx=00000850 esi=020f3d84 edi=00000000
eip=1000ce18 esp=01eaf4cc ebp=00000000 iopl=0 nv up ei pl nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000203
toolbar!DllMain+0x2ff8:
1000ce18 c20400 ret 0x4
0:005> g
Breakpoint 1 hit
eax=00000003 ebx=014ab010 ecx=020f395c edx=10031b18 esi=0224000a edi=00000008
0:005> g
Breakpoint 1 hit
eax=00000002 ebx=014ab010 ecx=020f395c edx=10031b44 esi=0224000a edi=00000008
0:005> g
wn IEFRAME CDocObjectView::DestroyViewWindow(): Destroying Host Window
Breakpoint 1 hit
eax=00000001 ebx=00000000 ecx=020f395c edx=00803e30 esi=020f3d04 edi=020f9328
0:005> g
Unable to remove breakpoint 1 at 1000ce18, Win32 error 487
"Attempt to access invalid address."
The breakpoint was set with BP. If you want breakpoints
to track module load/unload state you must use BU.
(564.db0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
Unable to remove breakpoint 1 at 1000ce18, Win32 error 487
"Attempt to access invalid address."
The breakpoint was set with BP. If you want breakpoints
to track module load/unload state you must use BU.
Ah ha! This wasn’t what I was looking for, but you can see that before we do the final release -- or crash -- the debugger complains that a breakpoint is set in a module that has been unloaded. The crash happens shortly after this and is simply caused by trying call into the module after it’s been unloaded.
So why was it unloaded? Let’s put a breakpoint on the module unload and re-run the repro and find out:
0:005> sxe ud:toolbar
0:005> g
[...]
Breakpoint 1 hit
eax=00000001 ebx=00000000 ecx=020f395c edx=00803e30 esi=020f3d04 edi=020f9328
eip=1000ce18 esp=01eafa98 ebp=00000000 iopl=0 nv up ei pl nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000203
toolbar!DllMain+0x2ff8:
1000ce18 c20400 ret 0x4
0:005> k
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\kernel32.dll -
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
01eafcf4 7c80aa7f ntdll!KiFastSystemCallRet
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\ole32.dll -
01eafd08 77513442 kernel32!FreeLibrary+0x19
01eafd14 77513456 ole32!CoFreeUnusedLibraries+0xa9
01eafeb8 77513578 ole32!CoFreeUnusedLibraries+0xbd
01eafec8 775133a2 ole32!CoFreeUnusedLibrariesEx+0x2e
01eafeec 007ab40f ole32!CoFreeUnusedLibraries+0x9
01eaffb4 7c80b50b xxxxx!xxx::_xxxxxxxx+0x3af
It’s being unloaded when IE’s code calls CoFreeUnusedLibrariesEx() when the window is closed. This is code I'm not super-familiar with, but I presume we’re doing it to trigger the unloading of DLLs for BHOs, toolbars ActiveX controls, and so on, to free up memory. However, we still have properly reference counted pointers to the toolbar so it shouldn’t be unloading quite yet.
According to MSDN CoFreeUnusedLibrariesEx() calls DllCanUnloadNow(), which is supposed to return S_FALSE if the DLL is not yet ready to be unloaded. Let’s set a breakpoint, step through the function, and see what it’s returning in this scenario:
0:005> bp toolbar!DllCanUnloadNow
0:005> g
[...]
Breakpoint 0 hit
eax=00000000 ebx=00000001 ecx=77606074 edx=00000000 esi=014a6dd0 edi=77606068
eip=10008fd0 esp=01eafd20 ebp=01eafd30 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
toolbar!DllCanUnloadNow:
10008fd0 8b0d3c1c0410 mov ecx,[toolbar!DllMain+0x37e1c (10041c3c)] ds:0023:10041c3c=00000000
0:005> p
[...]
0:005> p
eax=00000000 ebx=00000001 ecx=00000000 edx=00000000 esi=014a6dd0 edi=77606068
eip=10008fdd esp=01eafd20 ebp=01eafd30 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
toolbar!DllCanUnloadNow+0xd:
10008fdd c3 ret
As you can see by looking at ‘eax’ this function is returning 0, which is S_OK. I believe this is the cause of the bug.
After reviewing recent changes that I had made in IE7 I found that one of them caused us to hold onto the toolbar object longer than we used to. In previous versions of IE we happened to always do the final Release() before calling CoFreeUnusedLibrariesEx(), masking the bug in the toolbar. The fix in this case, for better or worse, was to update the code so that we release earlier like we used to.
Thoughts? Are these types of walkthroughs interesting or useful? If so I’ll do more of them.
Comments
- Anonymous
January 09, 2006
Just wondering if you guys make use of tools like IDA Pro and Soft ICE in your day to day debugging of problems like this or do you just go about with the Windows Tools. The reason I ask I am very interested in these debugging/reverse engineering as a hobby but most books I have seen talk about IDA Pro or the Soft ICE. Any thoughts? - Anonymous
January 09, 2006
dpp is another command useful for identifying objects with vtables. It's kind of like dps but with an extra dereference, so if you do dpp esp you'll see if there are any COM/C++ object pointers on the stack (provided you have private symbols of course).
Kris - in the Windows group, most people use kd/ntsd/cdb. These debuggers evolve with the OS so you always have support for the latest features. For example, if the heap implementation changes, !heap command will be updated and so on. - Anonymous
January 10, 2006
Ah, I hadn't tried 'dpp'. That is pretty useful.
Kris, as Pavel said for the most part we use the standard Windows debuggers. However, I'm sure that for hard-core application compatibility debugging and perhaps other uses that some people here use IDA Pro and others.
A long time ago a team I was on helped load balance a few application compatibility bugs for XP, and for a couple of them I found OllyDbg (http://www.ollydbg.de) to be helpful. - Anonymous
January 11, 2006
So, in the end, did you keep your change in place and notify the toolbar developer? Or for the sake of compatibility are you reverting to the previous behavior? (I'm hoping for the former.) - Anonymous
January 12, 2006
PatriotB - newer versions of this toolbar do not have this problem, so it has already been fixed.
However, the fix will help users of the older toolbar as well as unknown toolbars that might have the same type of bug.
(In this case the fix also cleaned up the code slightly by removing a redundant pointer. :-) - Anonymous
January 12, 2006
Oops, looks like I had missed the last sentence "The fix in this case, for better or worse, was to update the code so that we release earlier like we used to." - Anonymous
January 31, 2006
"Thoughts? Are these types of walkthroughs interesting or useful? If so I’ll do more of them."
Keep them coming please. Your friendly neighborhood tester appreciates it :).