High CPU on Wmiprvse.exe caused by memory leak DNSPROV.DLL Windows 2003

Certain customers have recently been experiencing an issue which I wanted to bring to your attention.

Issue with Domain Controllers Windows 2003 sp2

Wmiprvse.exe consistently consumes a high percentage of  CPU on Domain Controllers and svchost.exe has a a high handle count of around 75000 and another svchost.exe hosting rpcss has 23000 handles.
Impact: Servers need to be restarted on a scheduled basis. 

On Investigation of this issue  I discovered that there have been  other similar reported instances of this type of issue with other customers within the last 6 months. 

Note: this does not occur in Windows Server 2008.

Cause

This has been traced to a problem with dnsprov.dll  see below for more details;

“A windows Server 2003 (R2) SP2 machine, which implements a DNS role (usually true for many DCs), might become unreliable, unstable and misbehaving because of this problem. Manual intervention is needed to restore the server to its stable state each time administrators become aware of the problem going on, which can occur about once per week per DC, in an environment that implements SCOM/SCOM 2007 R2.

A windows Server 2003 server implementing the DNS role, when it receives certain WMI queries against the DNS WMI provider, will leak a TLS slot in the WMI process that hosts the DNS WMI provider. TLS slots are a finite resource (64+1024 slots available per process) so they can be quickly exhausted if leaked. A process that has its TLS slots exhausted doesn't behave normally and can incur in any kind of problem and unexpected behaviours.

Currently observed odd behaviours caused by this specific leak are:

- 100% CPU usages in the WMI host process that incurred the exhaustion.

- Other WMI providers sharing the same WMI host process not working as expected/misbehaving

Since WMI is a system service supporting many OS functions and application, having one of its processes in an unstable state makes the entire server unreliable, as mentioned and the problem needs to be resolved manually (DC reboot or WMI subsystem restarted).

SCOM 2007  happens to have a pattern of WMI queries that triggers the problem systematically after a few days monitoring a Windows Server 2003/DNS role.”

Workaround

On investigation of the issues 3 workarounds have proved successful in several of the previous reported cases.

Considering that:

1. The TLS slot is leaked each time a load/unload cycle occurs on the WMI DNS provider dnsprov.dll

2. A WMI provider is unloaded after 5 minutes it is idle

3. SCOM issues DNS queries at a rate that allows it to unload and reload between two queries

There are 3 possible workarounds see details below;

a. Execute a WMI script that uses the DNS provider to create an object and then never terminates, hence preventing the provider itself to become idle and then being unloaded. (Script is below).

' This script changes HostingModel property to run Microsoft DNS WMI provider
' in an isolated wmiprvse and allowing a workaround to a TLS leak.

strComputer = "."
strInstance = "__Win32Provider.Name='MS_NT_DNS_PROVIDER'"
strNewHostingModel="NetworkServiceHost:DNSSharedHost"
dim oMicrosoftDNSNamespace 'IWbemServices
dim oWMIProvider

Set oMicrosoftDNSNamespace = GetObject("winmgmts:"_
  & "{impersonationLevel=impersonate, (Security)}!\\" _
  & strComputer _
  & "\root\MicrosoftDNS")

set oWMIProvider=oMicrosoftDNSNamespace.Get(strInstance)
Wscript.echo "Provider                      : " & oWMIProvider.Name

'updates the HostingModel property
Wscript.echo "Current value for HostingModel: " & oWMIProvider.HostingModel
If oWMIProvider.HostingModel=strNewHostingModel Then
  Wscript.echo "No need to update DNS WMI Provider HostingModel property"
Else
  oWMIProvider.HostingModel=strNewHostingModel
  Wscript.echo "New value for HostingModel    : " & oWMIProvider.HostingModel
  'updates the object in the repository
  oWMIProvider.Put_
End If

This needs to be renamed to .vbs. Also of course fully tested prior to being applied to the live production servers. The advantage of this is that this could be implemented via a Group Policy  across the estate.

Note: This Script is provided with  provided "AS IS" with no warranties, and confers no rights.

b. Isolating DNS prov. In a private wmiprvse. This can be done via the following steps;

1. Run WBEMTEST.

2. Click Connect and input root\microsoftdns in the Namespace.

3. Click Enum Classes..

4. Select Recursive and click OK.

5. From the classes list, select __Win32Provider and double click it.

6. Click Instances.

7. Select the instance and double click it.

8. Select HostingModel from the properties list and double click it.

9. Change the value from “NetworkServiceHost” to “NetworkServiceHost:DNSProvHost”

(without double quotation marks)

10. Click Save Property.

11. Click Save Object.

12. Click close to quit WBEMTEST

The obvious disadvantage of this is that the above steps for workaround b are manual and impractical across a large enterprise environment.

c. Write a simple rule in OpsMgr rule to keep the DNS provider from unloading by calling on it very frequently – this appears to keep the provider from unloading, and therefore leaking TLS slots.

Please see the following Blog which details this final workaround more specifically;

https://blogs.technet.com/kevinholman/archive/2009/06/29/errors-alerts-from-the-dns-mp-script-failures-wmi-probe.aspx

In most cases it will not be a problem if you are regularly patching and rebooting your servers on a regular basis. However if you are experiencing issues hopefully this information will help. If you are a Premier customer however I would advise raising a support case via Premier to double-check and validate the advice offered here. Plus also it gives you a documented escalation path.

Jane

Comments