Tools to troubleshoot Distributed Cache SharePoint 2013

Tools to troubleshoot Distributed Cache issues:

====================================

First make sure that your Cache Cluster is configured following best practices, you can use the script:

https://blogs.technet.com/b/filipbosmans/archive/2015/01/04/troubleshooting-distributed-cache-for-sharepoint-2013-on-premise.aspx

Reduce complexity of your setup:

=====================

Revert to 1 Cache Host if possible.

If you have intermittent issues, or the Cluster is not down for instance, reduce Cache Hosts to maximum 2.

If you can trigger the issue, you can go all the way with increasing log levels and so on.

If you can't, you need to take into account that logs will become massive over time, so finetuning is recommended. 

There are too many scenarios to list here that might need different log sets, so adjust accordingly, I added things that you might not need at all or you can combine both.

Able to trigger the issue:

================

  • Set up Performance Counters for AppFabric, take a baseline set from morning till evening.
  • Take another set from the day you trigger.
  • Increase Tracing to VerboseEx and Events to Verbose through PowerShell.

Set-SPLogLevel -TraceSeverity VerboseEx -EventSeverity Verbose

  • Check if AppFabric logs are enabled:

“Event Viewer" - "Applications and Services Logs" - "Microsoft" - "Windows" - "Application Server–System Services" - "Microsoft-Windows-Application Server–System Services”:

Admin, Analytic, Operational, Debug. 

To get Debug and Analytic logs, choose “Microsoft-Windows-Application Server–System Services”, and from the top menu, chose "Show" - "Analytic and Debug logs".

Once all the 4 logs are showing, you can disable them, increase the size from default 1 MB to perhaps 5 or 10 through Right click - "Properties" and "General" tab, and then enable them again.

  • Setup Network Monitor on all servers you think are involved.

For instance with a communication issue, run it on all Cache Hosts and perhaps even your SQL.

  • Enable logman for Appfabric:

logman create trace CacheETWTracing -p "Microsoft-Windows Server AppFabric Caching" -o c:\temp\%computername%sAppFabricCacheTrace.etl -ow -f bin

logman start CacheETWTracing

Reproduce issue

logman stop CacheETWTracing

  • If you can trigger the issue with accessing a site for instance, enable Developer Dashboard if allowed, if not, take a Fiddler or enable IE12 instead and lookup the CorrelationID.
  • Reproduce.
  • Take limited merged ULS logs, make sure to include at least a couple of minutes before repro.

Merge-SPLogFile -Path "C:\FarmMergedLog.log" -Overwrite -StartTime (Get-Date).AddMinutes(-3)

AddMinutes(-3) will take logs from the moment you run the Merge command and goes back until 3 minutes before.

  • Take the Application and AppFabric Events, filter on "Last Hour", make sure to analyze them on a SharePoint Server test machine.
  • Take the logman trace and use Message Analyzer on a SharePoint Server test machine (otherwise it wont find the libraries) or save as .csv file.

 

Not able to trigger the issue:

===================

  • Increase tracing to Verbose(Ex) and Events to Verbose only for "SharePoint Server" - "Distributed Cache". (if you expect logs related to the issue will surface there)

Set-SPLogLevel -TraceSeverity Verbose -Identity "Distributed Cache" -EventSeverity Verbose

  • Check if AppFabric logs are enabled.
  • Set up Performance Counters for AppFabric, take a baseline for a prolonged period based on when the issue occurs usually.
  • After the baseline counters, start fresh batch of performance counters to compare to the set from when the issue occurred.
  • If you encounter the error/issue, look in ULS and Events and take a merged log with filtered Events from that time. (If possible only from relevant servers)

Merge-SPLogFile -Path "C:\FarmMergedLog.log" -Overwrite -StartTime "06/09/2008 16:00" -EndTime "06/09/2008 16:15"

 

If you have too many servers, you can possibly set up a Subscription on 1 server to collect all the Event logs from all your servers there.

Event Subscriptions

https://technet.microsoft.com/en-us/library/cc749183.aspx

Documentation and downloads:

=====================

Network Monitor

https://www.microsoft.com/en-us/download/details.aspx?id=4865

Message Analyzer

https://www.microsoft.com/en-us/download/details.aspx?id=44226

ULS Viewer

https://www.microsoft.com/en-us/download/details.aspx?id=44020

AppFabric Performance Counters

https://msdn.microsoft.com/en-us/library/ff637725(v=azure.10).aspx

Health Monitoring AppFabric

https://msdn.microsoft.com/en-us/library/ff921010(v=azure.10).aspx

Troubleshoot AppFabric

https://msdn.microsoft.com/en-us/library/ee790821.aspx

Keep in mind not all the actions listed here are allowed in a SharePoint setup.

Logging and Counters in App Fabric Cache

https://blogs.msdn.com/b/appfabriccat/archive/2010/12/14/logging-in-app-fabric-cache.aspx

Logging and Counters in App Fabric Cache

https://blogs.msdn.com/b/appfabriccat/archive/2010/12/14/logging-in-app-fabric-cache.aspx

Logman

https://technet.microsoft.com/en-us/library/bb490956.aspx

Comments

  • Anonymous
    February 29, 2016
    Thanks for the guidance, Filip !
    Great article !