Tools to troubleshoot Distributed Cache SharePoint 2013
Tools to troubleshoot Distributed Cache issues:
====================================
First make sure that your Cache Cluster is configured following best practices, you can use the script:
Reduce complexity of your setup:
=====================
Revert to 1 Cache Host if possible.
If you have intermittent issues, or the Cluster is not down for instance, reduce Cache Hosts to maximum 2.
If you can trigger the issue, you can go all the way with increasing log levels and so on.
If you can't, you need to take into account that logs will become massive over time, so finetuning is recommended.
There are too many scenarios to list here that might need different log sets, so adjust accordingly, I added things that you might not need at all or you can combine both.
Able to trigger the issue:
================
- Set up Performance Counters for AppFabric, take a baseline set from morning till evening.
- Take another set from the day you trigger.
- Increase Tracing to VerboseEx and Events to Verbose through PowerShell.
Set-SPLogLevel -TraceSeverity VerboseEx -EventSeverity Verbose
- Check if AppFabric logs are enabled:
“Event Viewer" - "Applications and Services Logs" - "Microsoft" - "Windows" - "Application Server–System Services" - "Microsoft-Windows-Application Server–System Services”:
Admin, Analytic, Operational, Debug.
To get Debug and Analytic logs, choose “Microsoft-Windows-Application Server–System Services”, and from the top menu, chose "Show" - "Analytic and Debug logs".
Once all the 4 logs are showing, you can disable them, increase the size from default 1 MB to perhaps 5 or 10 through Right click - "Properties" and "General" tab, and then enable them again.
- Setup Network Monitor on all servers you think are involved.
For instance with a communication issue, run it on all Cache Hosts and perhaps even your SQL.
- Enable logman for Appfabric:
logman create trace CacheETWTracing -p "Microsoft-Windows Server AppFabric Caching" -o c:\temp\%computername%sAppFabricCacheTrace.etl -ow -f bin
logman start CacheETWTracing
Reproduce issue
logman stop CacheETWTracing
- If you can trigger the issue with accessing a site for instance, enable Developer Dashboard if allowed, if not, take a Fiddler or enable IE12 instead and lookup the CorrelationID.
- Reproduce.
- Take limited merged ULS logs, make sure to include at least a couple of minutes before repro.
Merge-SPLogFile -Path "C:\FarmMergedLog.log" -Overwrite -StartTime (Get-Date).AddMinutes(-3)
AddMinutes(-3) will take logs from the moment you run the Merge command and goes back until 3 minutes before.
- Take the Application and AppFabric Events, filter on "Last Hour", make sure to analyze them on a SharePoint Server test machine.
- Take the logman trace and use Message Analyzer on a SharePoint Server test machine (otherwise it wont find the libraries) or save as .csv file.
Not able to trigger the issue:
===================
- Increase tracing to Verbose(Ex) and Events to Verbose only for "SharePoint Server" - "Distributed Cache". (if you expect logs related to the issue will surface there)
Set-SPLogLevel -TraceSeverity Verbose -Identity "Distributed Cache" -EventSeverity Verbose
- Check if AppFabric logs are enabled.
- Set up Performance Counters for AppFabric, take a baseline for a prolonged period based on when the issue occurs usually.
- After the baseline counters, start fresh batch of performance counters to compare to the set from when the issue occurred.
- If you encounter the error/issue, look in ULS and Events and take a merged log with filtered Events from that time. (If possible only from relevant servers)
Merge-SPLogFile -Path "C:\FarmMergedLog.log" -Overwrite -StartTime "06/09/2008 16:00" -EndTime "06/09/2008 16:15"
If you have too many servers, you can possibly set up a Subscription on 1 server to collect all the Event logs from all your servers there.
Event Subscriptions
https://technet.microsoft.com/en-us/library/cc749183.aspx
Documentation and downloads:
=====================
Network Monitor
https://www.microsoft.com/en-us/download/details.aspx?id=4865
Message Analyzer
https://www.microsoft.com/en-us/download/details.aspx?id=44226
ULS Viewer
https://www.microsoft.com/en-us/download/details.aspx?id=44020
AppFabric Performance Counters
https://msdn.microsoft.com/en-us/library/ff637725(v=azure.10).aspx
Health Monitoring AppFabric
https://msdn.microsoft.com/en-us/library/ff921010(v=azure.10).aspx
Troubleshoot AppFabric
https://msdn.microsoft.com/en-us/library/ee790821.aspx
Keep in mind not all the actions listed here are allowed in a SharePoint setup.
Logging and Counters in App Fabric Cache
https://blogs.msdn.com/b/appfabriccat/archive/2010/12/14/logging-in-app-fabric-cache.aspx
Logging and Counters in App Fabric Cache
https://blogs.msdn.com/b/appfabriccat/archive/2010/12/14/logging-in-app-fabric-cache.aspx
Logman
https://technet.microsoft.com/en-us/library/bb490956.aspx
Comments
- Anonymous
February 29, 2016
Thanks for the guidance, Filip !
Great article !