Exchange 2013: How does Dynamic Quorum work for a two Node DAG

Background

Two node DAG with a FS witness server. One of the node is 'down' (I have kept it like that), cluster has quorum and all services are online.

Trying to understand if a node's State=Down, isn't the Dynamic Quorum Group Manager supposed to trigger and set the DynamicWeight to '0' for that server.

If it’s not doing so, is this the way it is, or something is not quite right and I need to fix it?

Here is the link to the test which triggered this question. And yes we are talking about the Windows Server 2012 R2 feature on Exchange Server 2013.

 

The Test Environment

  • Two - Exchange 2013 SP1: NodeA, NodeB
  • One FileShare Witness : WitA
  • Two DCs
  • AD Site:- Default-First-Site
  • Domain\Forest: Single
  • OS: Windows Server 2012 R2

 

Troubleshooting info below:

PS C:\Windows\system32> Get-ClusterNode | ft name, dynamicweight, state, nodeweight,id -AutoSize

Name       DynamicWeight State NodeWeight Id

----       ------------- ----- ---------- --

exch1             1      Down        1    1

exch2             1      Up          1    2

 

PS C:\Windows\system32> (Get-Cluster).WitnessDynamicWeight

1

 

PS C:\Windows\system32> Get-ClusterResource

 

Name                          State                         OwnerGroup                    ResourceType

----                          -----                         ----------                    ------------

Cluster IP Address            Online                        Cluster Group                 IP Address

Cluster Name                  Online                        Cluster Group                 Network Name

File Share Witness (\fs1... Online                        Cluster Group                 File Share Witness

 

Validation test: Quorum Configuration

Description: Validate that the current quorum configuration is optimal for the cluster.

Validating cluster quorum settings.

Witness Type: File Share Witness

Witness Resource: \fs1.contoso.com\dag1.contoso.com

Cluster managed voting: Enabled

 

Voter Name

State

Assigned Vote

Current Vote

File Share Witness (\\fs1.contoso.com\dag1.contoso.com) (\\fs1.contoso.com\dag1.contoso.com)

Online

1

1

exch1

Down

1

1

exch2

Up

1

1

This quorum model will be able to sustain failures of 1 node(s) if the file share witness remains available and 0 node(s) when the file share witness goes offline or fails.

 

This quorum configuration can be changed using the Configure Cluster Quorum wizard. This wizard can be started from the Failover Cluster Manager console by selecting the cluster name in the left hand pane, then in the right "actions" pane selecting "More Actions..." and then selecting "Configure Cluster Quorum Settings...".

 

When all servers were up

node/2+1 = 2/2+1=2 required for quorum and we have 3 votes

When 1 server gone 1/2+1=1 quorum should recalculate to this. But it’s still considering 3 votes out of 1down server+1up server+1witness.

Ideally I should be able to lose the witness too after some time and still maintain quorum (unlike what the validation test is saying).

 

Solution

Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of member, the DAG witness server) to be available and interacting for the DAG to be operational.

All DAGs with an even number of members must use a witness server.

Hence a 3 node cluster behaves differently than a 2 node. Exchange 2013 DAG kind of forces you to have a witness server always.

You can specify only a name for the DAG and leave the Witness server and Witness directory fields empty. In this scenario, the task will search for a Client Access server that doesn't have the Mailbox server role installed. It will automatically create the default witness directory and share on that Client Access server and configure the DAG to use that server as its witness server.

You can 'override the quorum configuration using Windows2012 Failover Cluster Manager', however using it to modify a DAG is not recommended.

If you open Failover Cluster Manager in Administrative Tools, you’ll find the Database Availability Group (DAG), cluster networks and so on. Don’t try to manage the DAG using the Failover Cluster Manager, as this isn’t supported. The Exchange Management Console (EMC) or the Exchange Management Shell (EMS) are the only ways to manage the DAG.

Unless you’re doing a DC switchover and/or being assisted by Microsoft Support services (premier)

Now back to the point:

When we are left with 2 nodes and 1 witness server for Exchange HA. The Dynamic Quorum functionality kind of stops dealing with it. As 2nodes/2+1=2votes this means we need to have atleast 2 votes to have quorum.

So if we assume Dynamic Quorum triggers and removes 2 votes, 1 from Witness and 1 from nodeB.

Then the new formula we have is 1node/2+1=1vote which would mean this would allow us to lose both the witness and the nodeB. And nodeA will be the last man standing as in this article.

However having this scenario in a two node cluster brings in the split-brain problem. As if there is a full disconnect of nodeA site and nodeB+Witness can talk, they form quorum, nodeB mounts the database. Which is undesirable.

 

Conclusion

Hence Dynamic Quorum keeps the votes to 3 in a 2nodes+1witness scenario contrary to what is expected and in turn keeps everything running fine till we have 2votes available, just like 2010,Windows2008 days.

 

References 

 

Credits

This was originally posted here by inital author,