Recover Deleted Cluster Virtual Computer Object
I had delivered a CSRES engagement recently for a customer running Windows Server 2008 R2 failover cluster. In case this is the first time you heard CSRES, it stands for “Cluster Service Recovery Execution Service”. You can find more information here about this CSRES service. Please note that this service is also available for Windows Server 2012 Failover Cluster now.
Here is an example how the service is delivered. We simulate several cluster failure scenarios, and use customer’s backup application to recover them from the previous backup. The recover steps are recorded to create a draft Disaster Recovery document. The recovery time is also recorded to calculate RTO (Recovery Time Objective).
The customer had one Windows Server 2008 R2 Failover Cluster incident last year. Someone deleted a computer account from AD. Unfortunately that computer account is a Virtual Computer Object (VCO) used by a SQL Cluster Network Name resource. The SQL Cluster Network Name failed to online during next SQL cluster group failover, which caused the downtime of SQL service. I don’t know what exactly happened at that time, but customer ended up with re-installing the SQL cluster.
It is a bit complex in Windows Server 2008 R2 Failover Cluster to recover a deleted VCO by a Cluster Admin. Basically you need an AD Admin to perform certain steps, if you are not a member of AD Admins. The following two blogs have the detailed explanations and steps.
Recovering a Deleted Cluster Name Object (CNO) in a Windows Server 2008 Failover Cluster
Recovering a Deleted Cluster Name Object (CNO) in a Windows Server 2008 Failover Cluster, Part 2
Start from Windows Server 2012 Failover Cluster, a new feature “Automated repair of cluster virtual computer objects (VCOs) if they are deleted accidentally” is introduced. The network name resource will be brought ONLINE even the associate VCO is deleted from AD. Event ID 1207 is still reported, but now it says:
“
The computer object associated with the cluster network name resource 'CAP2' could not be updated in domain 'contoso1.local' during the
Resource post online operation.
“
To recover the missing VCO in Windows Server 2012/2012 R2 Failover Cluster, simply offline the network name resource, right click the resource name, then select “More Actions – Repair”. It will create a VCO computer account in AD.
You can use following cmdlet to locate which domain controller is used to creating the VCO.
get-clusterresource <network name resource> | Get-ClusterParameter | where {$_.name -match "CreatingDC"}
One the domain controller, Event ID 4741 is recorded in the Security Event Log for the VCO creation.
Since the VCO is a new computer account in AD, any additional SPN (Service Principal Name) registered with the previous VCO will be missing. Here is the SPN list before the VCO deletion.
Here is the SPN list after repair the VCO. You can see “test/cap2” is missing from the SPN list of the new VCO.
SPN is required for Kerberos authentication. Although I have never seen one, but in theory, SPN missing will cause Kerberos authentication failure. It may in turn cause client application fails connecting to the cluster resource. As a Cluster Admin, you should notify your AD Admin that you have recovered the VCO, and asked him to re-register any missing SPN. Let the AD Admin to scratch his head to figure out how to find the missing SPN.
Tips:
============
For Windows Server 2008 R2 Failover Cluster admins, please make a good friend with your AD admins. You need his/her help to recover the missing VCO so that the corresponding network name resource can be brought online.
For Windows Server 2012/2012 R2 Failover Cluster admins, please record all VCO’s SPN using “setspn.exe <VCO Computer Name>” command. It will help your AD Admins to quickly re-register all missing SPNs for a completed VCO recovery.