SharePoint 2016: More Cache hosts are running in this deployment than are registered with SharePoint
Problem
Discovered that a SharePoint 2016 farm was generating the following series of health rule violations listed in the farm's Review problems and solutions list:
- Distributed cache service is not enabled in this deployment
- Server role configuration isn't correct
- Distributed cache service is unexpected configured on server(s)
- Distributed cache service is not configured on server(s)
- More cache hosts are running in this deployment than are registered with SharePoint
The farm presented a three server topology: one DB, one APP (MinRole: Application with Search) and one WFE (MinRole: Front-end with Distributed Cache). Began troubleshooting.
Analysis
01) Checked Services in Farm: found Distributed Cache service provisioned on both APP (Compliant: No(Fix)) and WFE (Compliant: Yes). Clicked on Fix - after a moment, Compliant returned to No(Fix).
02) Checked Services on Server: found Distributed Cache service on both APP and WFE having Status of Stopped. On
03) Checked Windows Server Services administrative tool on APP and WFE and found both presenting Startup Type Disabled and Status [blank] for AppFabric Caching Service. Also found that the identity of each was the farm's application service account, call it spSvc.
04) Observation: Distributed Cache should only be running on the farm's single WFE.
05) Checked status of service on each server by executing the following in an elevated SharePoint Management Shell on the APP server:
$instanceName = "SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName} ft server,id,status -auto
the outcome of which was:
Server | ID | Status |
---|---|---|
SPServer Name=APP | [ID1] | Disabled |
SPServer Name=WFE | [ID2] | Disabled |
06) Checked cache cluster status by executing the following in an elevated SMS on the WFE server:
Use-cachecluster
Get-cachehost
The outcome of which was
HostName : CachePort | Service Name | Service Status | Version Info |
---|---|---|---|
[WFE]:22233 | AppFabricCachingService | DOWN | 3 [3,3][1,3] |
07) Checked job status, for running jobs involving Distributed cache services, by executing the following in an elevated SMS on the WFE server:
Get-SPTimerjob Job-Service-Instance-[ID1]
Get-SPTimerjob Job-Service-Instance-[ID2]
both of which returned [blank]. This verified that there were no jobs scheduled to remove these services.
08) Removed AppFabric service from APP server by executing the following in an elevated SMS on the WFE server (doesn't matter which):
(Get-SPServiceInstance -id "ID1").Delete()
$instanceName = "SPDistributedCacheService Name=AppFabricCachingService"
Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName} ft server,id,status -auto
This returned a single instance of the AppFabric service - running on the WFE server.
09) Removed AppFabric service from WFE server by executing the following in an elevated SMS on the WFE server (doesn't matter which):
(Get-SPServiceInstance -id "ID2").Delete()
$instanceName = "SPDistributedCacheService Name=AppFabricCachingService"
Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName} ft server,id,status -auto
tbd
I hope it helps .......