SQL Server 2017 Troubleshooting: File share witness failed to arbitrate for the file share

Problem

While performing a routine check of a SharePoint 2016 staging farm's Always On Availability Group, I discovered that the databases listed in the secondary SQL Server's Databases node were all listed as "(Not Synchronizing)".

Event viewer

System event log

Looking in the primary server's operating system event log (I was remoting into the primary, but connected through SSMS to the secondary), I found this critical event that had occurred several days previously:

  • 1564: File share witness resource 'File Share Witness' failed to arbitrate for the file share '[SharePath]'. Please ensure that the file share '[SharePath]' exists and is accessible by the cluster.

Looking further, I found several error events like the following occurring at around the same time:

  • 1069: Cluster resource '[AvailabilityGroupName]' of type 'SQL Server Availability Group' in clustered role '[AvailabilityGroupName]' failed.
  • 1205: The Cluster service failed to bring the clustered role '[AvailabilityGroupName]' completely online or offline.  One or more resources may be in a failed state...

including this one:

  • 1254: Clustered role '[AvailabilityGroupName]' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

SQL Server event logs

Similar messages were found in the SQL Server event logs about the same time:

  • AlwaysOn Availability Groups connection with secondary database terminated for primary database...
  • Remote harden of transaction 'user_transaction'...failed
  • The local availability replica of availability group '[AvailabilityGroupName]' is in a failed state

Solution

This problem presents itself as seemingly more complex than it actually is.  To resolve this problem you will need to resume data movement for each of the secondary databases that is not in a synchronized state:

  1. On other SQL Server, launch SQL Server Management Studio (SSMS).
  2. Connect to the SQL Server that is the secondary server in the Always On Availability Group.
  3. In Object Explorer, expand: Always On High Availability > Availability Groups > [AvailabilityGroupName] > Availability Databases
  4. Right-click on each database, and then select Resume Data Movement.

The larger the database, the longer it will take for data movement to be resumed.

References

Notes

  • The issue was triggered by a GPO update pushed to the file server hosting the quorum folder.