Troubleshooting Forefront TMG 2010 Performance issues Cheat Sheet

We encourage you to enhance this guide by identifying missing areas (scenarios, features, lifecycle...), provide links to and write descriptions of existing content, and providing new content where there are gaps. Join the community!

Scenario 1: Slow Internet Access through Forefront TMG

Potential Issue 1 

 

Task

Commands/Approach

What to look for at this stage

Get a dump of wspsrv.exe process while the issue is happening

Use the approach from this post

 

  • Make sure that you also collect Perfmon during the time that the issue is not happening until the time that the issue happens.
  • Use the counters from this post (item 2)

Load the wspsrv.exe dump and quick review all the threads within this process

On WinDBG type:

~kb*

  • Look for patterns
  • Check if there are multiple threads using similar stack.
  • Take notes of your findings.

Check for critical sections

On WinDBG type:

!cs –l

  • Check if there are locked threads.

Dump the information related to the critical section that is locked

On WinDBG type:

!cs -o <owning thread>

  • Compare the threads that are in critical section with the ones that appeared as result of ~kb* command.
  • Do we have a pattern here?
  • Do we have a suspicious activity in one of those threads?

 

**Sample Article:

**http://blogs.technet.com/b/yuridiogenes/archive/2010/10/20/high-processor-utilization-by-wspsrv-exe-process-on-tmg-2010.aspx

Potential Issue 2

 

Task

Commands/Approach

What to look for at this stage

If critical section doesn’t show any result but there are still suspicious patterns on the user mode dump, start to review perfmon.

Use TMG PAL while reviewing Perfmon Data.

 

  • Review PAL Report.
  • Make sure to address the recommendations from PAL’s report.

Don’t focus only on TMG counters, also review Windows core counters

Memory/*, Processor/*, Network Interface/*, Process/*, Physical Disk/*, Threads/*





Enable netlogon logging to review potential authentication issues:







nltest /dbflag:0x2080ffff

  • Review best practices for each core component.
  • Make sure to understand the side effects of disk bottleneck on TMG by reading this post.
  • Authentication and name resolution can also cause issues of this nature.

 

 

**Sample Article:

**http://blogs.technet.com/b/yuridiogenes/archive/2010/11/16/hey-dc-are-you-still-there.aspx

Scenario 2: Firewall Service Crash 

 

Task

Commands/Approach

What to look for at this stage

Make sure to attach a debugger to wspsrv.exe process in order to collect the dump when service crashes

Use Adplus or  DebugDiag

 







  • Make sure to correctly install and configure one of those tools in order to collect the dump.

Load the crash dump on WinDBG

On WinDBG type:

!analyze -v

  • Careful read the command output
  • Review the exception record
  • Review the faulting module

If the faulting module is not a TMG component, review who is the owner

On WinDBG type:

lmvm <faulting module>

  • Review the module’s timestamp, sometimes there is a newer version that already fix the issue
  • If the issue is caused by a third party, make sure to refer to third party site for more support info.

**Sample Articles:

**http://blogs.technet.com/b/yuridiogenes/archive/2009/08/20/isa-server-firewall-service-crashed-but-why.aspx http://blogs.technet.com/b/yuridiogenes/archive/2008/08/13/capturing-an-user-mode-crash-on-isa-server-part-2-of-2.aspx

Scenario 3: TMG Stop Responding 

 

Task

Commands/Approach

What to look for at this stage

First, make sure that is TMG that stops responding or the Windows OS that stops responding

Get answer for the following questions:





1) What do you do in order to put the server in production again?

2) Does the server get back in production if you restart Firewall Service?

3) How frequent does this issue happen?

 

  • The goal here is really to understand if this is a hang on TMG or if the OS is the one giving up and as a side effect TMG stops answering.
  • Review TMG Alerts; check if there are flood mitigation events.

If the whole server stops answering, get complete (or kernel) memory dump

Use the approach from this article to configure the server.

 

  • Make sure to review each step of this article when preparing to collect kernel memory dump

Load the kernel dump on WinDBG

On WinDBG type:

!locks

  • This command will list the kernel ERESOURCE locks
  • The goal is to see if there is something locked in kernel mode
  • Review the output to see if there are shared resources with a high contention count
  • Review if there are threads waiting for that resource

Once you find a thread that might be waiting for a resource, dump the thread

On WinDBG type:

!thread <thread number>

  • Review the thread information
  • Look for potential IRPs

If the thread has IRPs, dump the IRP

On WinDBG type:

!irp <irp number>

  • Check if the IRP is pending on a lower level driver.
  • Check if there is third party holding that resource.
  • Use lmvm to dump info about the module

**Sample Articles:

**http://blogs.technet.com/b/yuridiogenes/archive/2008/08/22/antivirus-and-isa-server.aspx

http://blogs.technet.com/b/yuridiogenes/archive/2010/11/15/we-are-all-waiting-for-you-mr-disk-are-you-there.aspx

http://blogs.technet.com/b/yuridiogenes/archive/2010/09/19/the-curious-case-of-tmg-stopping-responding-in-random-days-but-always-during-the-morning.aspx

Download the PDF version of this Cheat Sheet from here.

This article was originally written by: 

**Yuri Diogenes, Senior Technical Writer

Windows Server iX | IT Pro Security

Microsoft Corporation

**--------

Yuri’s Blog: http://blogs.technet.com/yuridiogenes

Team’s Blog: http://blogs.technet.com/b/securitycontent

Twitter: http://twitter.com/yuridiogenes

Forefront TMG Wiki Portal Page