Analyzing Common CLR Performance Problems
How To Use This document
This document is intended to help you diagnose common CLR performance issues. Over the years we have seen a wide variety of CLR performance issues from our customers. This document tries to classify these issues in broad categories and provide guidance for each class of problems. Many facts mentioned in this document hold for the CLR versions V1.0 and V1.1. If there are differences in behavior between these two versions then they are called out. V2.0 behavior, if documented here, is subject to change.
Minimum requirements for a CLR PERFOrmance investigation
Please follow the following guidelines
n If you are analyzing a memory related issue then collect:
1) MUST: .NET CLR Memory perf counters trace log
2) MUST: Vadump snapshot (vadump –o –p <pid> and vadump –v –p <pid>
3) MUST: Dump file so that SOS commands can be run as a post process.
4) NICE: Read this article on GC to gain a better understanding of your problems.
5) NICE: If GCHeap is suspected to be an issue then gather SOS!DumpHeap –stat, SOS!DumpHeap and SOS!EEHeap –gc log files.
n If you are analyzing an execution speed related issue (things are slow, take too long etc.) then collect:
1) MUST: .NET CLR Memory\% Time in GC perf counter, Process\% Processor Time
2) MUST: Sampling profile logs. Please make sure that the symbols are correct and do really point to code in the CLR. These would most likely be functions from mscorwks.dll. Note that GC related functions at the top doesn’t indicate an issue in GC’s performance usually, rather it’s excessive allocations by the application.
3) NICE: Sanity check the machine for excessive paging due to low memory conditions, poor performance due to disk fragmentation, other processes or network interference etc.
PROBLEM: My Application’s Private Bytes Grow Indefinitely
Or
PROBLEM: My Application is leaking memory
1) There are several ways in which you can confirm memory leaks. If you know that the GC Heap is leaking then skip to step 5).
2) Observe “Process\Private Bytes” perf counter or “MemUsage” column in the task manager. If they increase over a period of time (this time could be as little as a few hours or could be a few days) then you have a memory leak. Collect the “.NET CLR Memory\# Bytes in all Heaps” perf counter for your application when the memory leak manifests itself. If this counter is steadily growing (presumably at the same rate as the total memory growth of your application) you have a managed memory leak. The application apparently “leaks” in-spite of the Garbage Collector (GC) because references to objects in the GC heap are still alive. These references or “roots” would hold on to the managed objects and prevent the GC from collecting garbage.
3) If the GC Heap is growing but not at the same rate as the application’s total memory over a period of time then you might have smaller managed objects holding onto larger unmanaged memory. Depending on the memory pressure on the machine the GC might or might not clean up these smaller managed objects. Consider using the Dispose pattern to eagerly garbage collect these expensive resources.
4) If the GC Heap is not growing over a period of time then you have an unmanaged memory leak. Snapshot the application’s working set at regular intervals with vadump. Use vadump –o –p <pid> and redirect the output to a log file. The summary at the end of the log file would appear as in Figure 1. The two interesting numbers are the “Heap” which is the NT process heap and “Other Data” within which the GC Heap is located. If the “Heap” is growing steadily then you have an unmanaged memory leak[1]. If the “Other Data” is growing but the GC Heap perf counter (“.Net CLR Memory\# Bytes in all Heaps”) was not then you have an unmanaged memory leak stemming from calls to either “VirtualAlloc”/”VirtualAllocEx” or large (>256 KB) allocations on the NT process heap. Please use traditional memory leak tools to analyze these problems. E.g. !heap 0 would show calls to HeapAlloc with no corresponding calls to HeapFree.
Catagory Total Private Shareable Shared
Pages KBytes KBytes KBytes KBytes
Page Table Pages 29 116 116 0 0
Other System 8 32 32 0 0
Code/StaticData 1806 7224 1040 3732 2452
Heap 201 804 804 0 0
Stack 9 36 36 0 0
Teb 4 16 16 0 0
Mapped Data 141 564 0 68 496
Other Data 168 672 668 4 0
Total Modules 1806 7224 1040 3732 2452
Total Dynamic Data 523 2092 1524 72 496
Total System 37 148 148 0 0
Grand Total Working Set 2366 9464 2712 3804 2948
Figure 1: VADump output
5) Once you have established that the GC Heap is leaking memory, the next step is to identify the cause of the leak. You can approach the problem in two ways. First, you can try to identify which Type(s) of object is leaking. SOS[2] is an ntsd/windbg extension that can help examine the GC Heap. E.g.
0:001> !DumpHeap -stat -min 100
total 1992 objects
Statistics:
MT Count TotalSize Class Name
d750604 1 116 System.Double[]
d0373c 1 140 System.Boolean[]
d9198dc 2 248 System.Web.HttpCachePolicy
d02edc 2 264 System.UInt64[]
<snip . . . >
d026a0 241 49868 System.Int32[]
d02960 316 67272 System.Collections.Hashtable/bucket[]
d02364 22 106364 System.Char[]
d02c2c 78 249752 System.Byte[]
79b4f3f8 995 331976 System.String
Total 1992 objects
large objects
Address MT Size
90d9350 79b4f3f8 87492 System.String
<snip . . . >
90984d8 d0209c 4096 System.Object[]
90ef980 d0209c 2064 System.Object[]
total 16 large objects
More help on SOS commands can be found by typing in “SOS!Help” in the ntsd/windbg command window. You can also use CLRProfiler[3] (available at this gotdotnet site) and use the “Show Heap Now” view taken for two or more snapshots to identify the leaking objects. Second, you need to identify the frequent allocators of these Types. The GC Heap grows because there are references or “roots” from outside the GC Heap to the managed objects. If you using SOS then use SOS!GCRoot <addr> to handles for the object at <addr>. The object addresses can be obtained from DumpHeap command. If you are using the CLRProfiler then you can simply trace the references and observe how they trace back to the <root>. GC Handles can also cause GC Heap leak. Please refer to 8) for details.
6) Sometimes the Finalizer thread can fall behind in finalizing objects. This can happen either due to Finalizer thread being blocked (e.g. if the main thread is marked STA and doesn’t let the Finalizer thread run) or slow due to excessive load from other similar or higher priority threads. This causes the GC heap to grow and cause a perception of a leak. If the Finalizer thread gets a chance to run then this “leak” would disappear. A combination of calls to GC.WaitForPendingFinalizers and GC.Collect can also empty the finalization queue. WARNING: Use these two calls ONLY to confirm this theory (i.e. the finalizer thread is falling behind). If you do find that this is the case you should consider marking your thread as MTA. If excessive load causes the Finalizer thread to lag behind then the GC would step up the collections and eventually as memory pressure on the system grows[4] the memory would be collected.
7) Pinning can cause perception of a memory leak. Please refer to item 3) in the next section for more details.
8) GC Handle leaks can also cause GC memory to leak. GC Handles are handles to Objects. They leak when these handles are created and forgotten about. Use SOS!objsize to list handles. If this list is large and growing then you have a GC Handle leak.
PROBLEM: My Application’s Virtual Bytes Grow Indefinitely
OR
PROBLEM: My Application experiences excessive memory fragmentation
OR
PROBLEM: My Application CRASHES with OUT-OF-MEMORY exception (while there is still a lot of memory available on the machine)
On 32 bit machines, Virtual Memory of the application is limited to 2 GB[5]. Long running applications can load enough modules and allocate enough memory to run out of Virtual Memory. There are several ways in which you can confirm that your application has hit this ceiling. Observe “Process\Virtual Bytes” perf counter or “Virtual Memory Size” column in the task manager. If it is close to the 2 GB limit then the application is very close to throwing an Out-of-Memory exception. The GC tries to allocate a contiguous memory block of SEGMENT_SIZE but fails to allocate this memory under memory stress and the managed allocation request fails with an exception. On server GC, if there are more than 8 processors, SEGMENT_SIZE is 16 MB for normal heap and 8 MB for Large Object Heap; for more than 4 processors it's 32 MB and 16 MB respectively and for 4 or less processors it’s 64 MB and 32 MB respectively. In workstation GC it's always 16 MB and 16 MB respectively.
1) Snapshot the virtual memory address space with vadump –v –p <pid> at regular intervals. You can also use !inetdbg.vmmap command in ntsd/windbg. Look for patterns of allocations that are odd. You can windiff the vadump logs to look for such patterns. Note that minimum virtual space that can be allocated in Windows NT and XP is 64 KB, hence module loads of VirtualAddress calls with sizes less than 64 KB should “waste” some space. Packing these regions would help.
2) If visual inspection of the memory regions or the allocation patterns don’t provide clues you might have to put a break point at kernel32!VirtualAllocEx and examine stack traces. This process can be automated with post processing tools.
3) Fragmentation can occur in the GC heap when objects are pinned for significant periods of time. When this occurs, the pinned objects may get promoted to generation 2. Over time, other objects around the pinned objects get collected, leaving free space surrounding the pinned objects. Since new allocations always occur in generation 0, the free space in these Gen 2 memory segments is wasted, which can increase the OS memory usage substantially even if the total memory in all generations (as viewed via perf counters) is proportionately low. The best way to verify this is with the SOS debugger extension’s DumpHeap command, which will give you a breakdown of the “Free” space in the heap. If this is significant, then consider helping the application developer reduce pinned objects over long periods of time – pinning should be as short as duration as possible.
[1] If there is a perf counter that tracks process heap usage of a process then please send feedback.
[2] SOS was called Strike in CLR V1.0 and V1.1
[3] Note that SOS debugger extensions and CLRProfiler complement each other.
[4] When 90% of the physical RAM is used
[5] Applications can be run the 3GB mode also in which all arguments apply with the higher limit.