Configure an ILB listener for SQL Server AlwaysOn Availability Groups in Azure ARM

This blog post is essentially a porting to Azure Resource Manager (ARM) of the article below, specifically focused on Internal Azure Load Balancer (ILB), where Azure Service Manager (ASM) was originally used:

Configure an ILB listener for AlwaysOn Availability Groups in Azure https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-configure-ilb-alwayson-availability-group-listener

The steps below assume you have already configured:

  • At least a Resource Group (RG) to host all your resources, or re-using an existing one;
  • A Virtual Network (VNET) for each site and related subnets;
  • A VPN or Express Route infrastructure to connect the two VNETs in case of multi-datacenter deployment (HA and Geo-DR);
  • The necessary VMs to host your Active Directory Forest Domain, if not leveraging on-premises infrastructure connected via VPN or Express Route;

Using Internal Load Balancing (ILB), with a private address for the listener, will imply that clients, tools and applications will only access the listener from within the same virtual network (VNET), or other connected VNETs via ExpressRoute or VPN. Only regional networks are supported, that is no “Affinity Groups”. Additionally, at the moment there is a restriction in ARM that will permit only one ILB, then one internal AG listener, per “Availability Set”. This limitation could be removed in the future.

Create ILB with Load Balancer Rule & Health-Probe

In Azure ARM, networking part is almost totally changed, then the way to create load balanced endpoints compared to previous ASM API. In the PowerShell code snippet below (also contained in the attached script) you can see an example of how to build an ILB to support a SQL Server AlwaysOn Availability Group (AD) Listener:

Some comments on the code snippet above:

  • In ARM, whenever you create an ILB, the IP address used is static by default. There is no more PowerShell cmdlet to “mark” that IP as static. For more info, see my blog post below:

 How to Set a Static Internal Private IP (DIP) for Azure VM in ARM http://blogs.msdn.com/b/igorpag/archive/2016/01/20/how-to-set-a-static-internal-private-ip-dip-for-azure-vm-in-arm.aspx

  • ILB internal IP address must be in the same VNET as SQL Server node VMs.
  • Even if health probe is required for proper AG Listener functioning, TCP port 59999 is not mandatory, but must match with the port later used to modify the cluster resource as reported in section “Create the availability group listener”.
  • In the Load Balancer rule definition, “EnableFloatingIP” switch is the ARM equivalent of ASM “DirectServerReturn” used for “Add-AzureEndpoint” PowerShell cmdlet.
  • In the Load Balancer rule definition, I assumed default TCP port 1433 for SQL Server instance: if a different port is used, please change value for “BackendPort” accordingly.
  • Additionally, “FrontendPort” can be changed to any other valid port value, but client applications and tools must explicitly reference it.

In order to gain 99,95% High-Availability SLA, all SQL Server VMs must be part of the same ARM “Availability Set” (AS). Please also note that in ARM Azure now provides three “Fault Domains” (FD) for each AS, compared to only two FDs for ASM. Additionally, if you are using an additional witness or file share node to provide adequate odd cluster quorum number, then you need to include in the same AS, but you don’t have to include that VM in the Load Balancer backend pool since SQL Server is not installed there. Instead, if you install an odd number of SQL Server nodes, you don’t have to configure a witness or file share VM, and you should include all these VMs in both the AS and Load Balancer definition.

Creating Virtual Machine NICs

Once create the load balancer (LB) there is a second essential step: you need to create NIC objects for the SQL Server VMs and add them to the “Backend Address Pool” of the just created LB above as shown in the picture/script below (complete script in the file attached to this post):

Let me emphasize that for the witness NIC, if witness is used for the cluster configuration, you don’t have to add it to the load balancer backend pool since that VM will *not* receive SQL traffic, SQL Server is not installed there. What is important instead, is to include the Witness VM in the same “Availability Set” (AS) with the SQL Server VMs for the reason explained at the very end of the previous section.

Creating SQL Server and Witness VMs

 
After creating the NICs, you can create the VMs where you will install SQL Server. If a Witness VM will be used to have an odd number of cluster nodes (and votes), remember that SQL Server must *not* be installed here but must be part of the same “Availability Set” (AS) as SQL Server instances. In the attached file to this blog post I included a sample to show how these VMs should be created: be careful especially with the disk configuration since I didn’t use any optimization, I provided this code only as a simple example.

Verify that KB2854082 is installed if necessary

Next, if any servers on the cluster are running Windows Server 2008 R2 or Windows Server 20012, you must verify that the hotfix KB2854082 is installed on each of the on-premises servers or Azure VMs that are part of the cluster. Any server or VM that is in the cluster, but not in the availability group, should also have this hotfix installed. In the remote desktop session for each of the cluster nodes, download KB2854082 to a local directory. Then, install the hotfix on each of the cluster nodes sequentially. If the cluster service is currently running on the cluster node, the server is restarted at the end of the hotfix installation.

Initialize the WSFC Cluster VMs

Now that you have the VMs ready and joined to the domain, you need to prepare these VMs and then create a Failover Cluster instance. In order to do that you can follow the steps reported at the link below in the section titled “Initialize the WSFC Cluster VMs”. Be sure to select the “PowerShell - Classic” section at the beginning of the article and don’t worry, these steps are valid also using ARM.

Configure AlwaysOn Availability Groups in Azure VM (PowerShell) https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-alwayson-availability-groups-powershell

Configure the Availability Group

In this section you will install the Cluster including SQL Server VMs and eventually a Witness, then you will create an “Availability Group” on the SQL Server instances. To complete this procedure, you need to execute the steps reported at the link below in the section titled “Configure the Availability Group”. Be sure to select the “PowerShell - Classic” section at the beginning of the article and don’t worry, these steps are valid also using ARM.

Configure AlwaysOn Availability Groups in Azure VM (PowerShell)

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-alwayson-availability-groups-powershell

Open the Firewall Ports in Availability Group Nodes

In this step, you create a firewall rule to open the probe port for the load-balancer rule (59999 as specified earlier), and another rule to open the availability group listener port. Since you created the load-balancer rule on the Azure VMs that contain availability group replicas, you need to open the probe port and the listener port on the respective Azure VMs.

  1. On VMs hosting replicas, launch Windows Firewall with Advanced Security.
  2. Right-click Inbound Rules and click New Rule.
  3. In the Rule Type page, select Port, then click Next.
  4. In the Protocol and Ports page, select TCP and type 59999 in the Specific local ports box. Then, click Next.
  5. In the Action page, keep Allow the connection selected and click Next.
  6. In the Profile page, accept the default settings and click Next.
  7. In the Name page, specify a rule name, such as AlwaysOn Listener Probe Port in the Name text box, and click Finish.

Create the Availability Group Listener

 

In this step, you manually create the availability group listener in Failover Cluster Manager and SQL Server Management Studio (SSMS).

  1. Open Failover Cluster Manager from the node hosting the primary replica.
  2. Select the Networks node, and note the cluster network name. This name will be used in the $ClusterNetworkName variable in the PowerShell script.
  3. Expand the cluster name, and then click Roles.
  4. In the Roles pane, right-click the availability group name and then select Add Resource -> Client Access Point.

5. In the Name box, create a name for this new listener, then click Next twice, and then click Finish. Do not bring the listener or resource online at this point.

6. Click the Resources tab, then expand the Client Access Point you just created. You will see the IP Address resource for each of the cluster networks in your cluster. If this is an Azure-only solution, you will only see one IP address resource.

7. If you are configuring a hybrid solution, continue with this step. If you are configuring an Azure only solution, skip to the next step.

    1. Right-click the IP Address resource that corresponds to your on-premises subnet, then select Properties. Note the IP Address Name and network name.
    2. Select Static IP Address, assign the static IP address used for ILB (10.1.2.200 in the initial example above) and then click OK.

8. Right-click the IP Address resource that corresponds to your Azure subnet and then select Properties.

9. In the same IP Address properties window, change the IP Address Name. This IP address name will be used in the $IPResourceName variable of the PowerShell script. Repeat this step for each IP resource if your solution spans multiple Azure VNets.

10. Logon to one of the Cluster node, copy the PowerShell script below into a text editor and set the variables to the values you noted earlier:

# Define variables
$ClusterNetworkName = "<MyClusterNetworkName>"
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = "<IPResourceName>"
# the IP Address resource name
$ILBIP = “<X.X.X.X>”
# the IP Address of the Internal Load Balancer (ILB)

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ILBIP";"ProbePort"="59999";"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";"EnableDhcp"=0}

# If you are using Windows Server 2008 R2 use this:
cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

NOTE: If you want to check cluster resource advanced properties, you can use the simple piped cmdlets below:

 Get-ClusterResource $IPResourceName | Get-ClusterParameter 

 

11. Once you have set the variables, open an elevated Windows PowerShell window, then copy the script from the text editor and paste into your Azure PowerShell session to run it. If the prompt still shows >>, type ENTER again to make sure the script starts running.

12. Repeat this on each VM. This script configures the IP Address resource with the IP address of the cloud service and sets other parameters like the probe port. When the IP Address resource is brought online, it can then respond to the polling on the probe port from the load-balanced endpoint created earlier in this tutorial.

Bring the listener online

1. Navigate back to Failover Cluster Manager. Expand Roles and then highlight your Availability Group. On the Resources tab, right-click the listener name and click Properties.
2. Click the Dependencies tab. If there are multiple resources listed, verify that the IP addresses have OR, not AND, dependencies. Click OK.
3. Right-click the listener name and click Bring Online.
4. Once the listener is online, from the Resources tab, right-click the availability group and click Properties.

 5. Create a dependency on the listener name resource (not the IP address resources name). Click OK.

6. Launch SQL Server Management Studio and connect to the primary replica.
7. Navigate to AlwaysOn High Availability | Availability Groups | | Availability Group Listeners.
8. You should now see the listener name that you created in Failover Cluster Manager. Right-click the listener name and click Properties.
9. In the Port box, specify the port number for the availability group listener by using the $EndpointPort you used earlier (in this tutorial, 1433 was the default), then click OK.

Follow-up Items

After the availability group listener is created, it may be necessary to adjust the RegisterAllProvidersIP (default=0) and HostRecordTTL (default=1200) cluster parameters for the listener resource. These parameters may reduce reconnection time after a failover which may prevent connection timeouts. You can do this executing the script below on one of the Cluster nodes:

$CAPName = 'ListenerNetworkName'
Get-ClusterResource $CAPName | Set-ClusterParameter -Multiple @{"HostRecordTTL"=300;"RegisterAllProvidersIP"=1}

The properties were stored, but not all changes will take effect until “ListenerNetworkName” is taken offline and then online again. If you want to check default properties for the CAP (Client Access Point) network name used for AG Listener, you can use the cmdlet below and look at the output:

Get-ClusterResource $CAPName | Get-ClusterParameter 

For more information on these parameters, as well as sample code, see Create or Configure an Availability Group Listener.

Test the Availability Group Listener Client Connectivity

In this step, you test the availability group listener using a client application running on the same network.If the AlwaysOn replicas are in different subnets, clients must specify "MultisubnetFailover=True" in the connection string. This results in parallel connection attempts to replicas in the different subnets. Note that this scenario includes a cross-region AlwaysOn Availability Group deployment. One example would be to connect to the listener from one of the VMs in the same Azure VNet (but not one that hosts a replica). An easy way to complete this test is to try to connect SSMS to the availability group listener. Recently, Microsoft made available SSMS via standalone installation package, then it is no more necessary to run the full SQL Server setup to only install this tool. You can find it at the link below:

Download SQL Server Management Studio

https://msdn.microsoft.com/en-US/library/mt238290.aspx  

IMPORTANT: Even if you can still connect directly to the SQL Server instances, if you want to leverage high-availability provided by AlwaysOn AG, you need to use the Listener network name, that is the CAP cluster resource name you chose in Step[5] in the section "Create the Availability Group Listener" above in this post.

Another simple method is to run SQLCMD.exe as follows:

sqlcmd -S "<ListenerNetworkName>,<EndpointPort>" -d "<DatabaseName>" -Q "select @@servername, db_name()" -l 15

If the EndpointPort value is 1433, it is not required to specify it in the call. The previous call also assumes that the client machine is joined to the same domain and that the caller has been granted permissions on the database using windows authentication. In addition to automatically connecting clients to the primary replica, a listener can also be used to redirect read-only workloads to the secondaries. This can improve the performance and scalability of your overall solution. For more information, see Use ReadIntent Routing with Azure AlwaysOn AG Listener.

 

Test the Availability Group Listener Failover

With AlwaysOn Availability Group (AG), it is important to remember that you *don’t* have to use Failover Cluster Manager to test AG Listener failover between nodes, otherwise the operation will fail. The tool you have to use is SQL Server Management Studio (SSMS).

Once connected to the SQL Server instance/VM owning the primary replica, expand “AlwaysOn High Availability”, then right-click your custom created AG and select “Failover” from the contextual menu:

Depending on the node you are going to move the AG Listener, and then promote to Primary Replica, you may see a warning related to possible data loss: if the replica/node you are moving to is configured for synchronous data replication, then no data loss, while you will see in case of asynchronous data replication.

 

Once selected the SQL Server instance, click next and on till completion and you will see that AlwaysOn promoted a new Primary and downgraded the previous one to Secondary. IMPORTANT: If you moved/promoted a new Primary from an asynchronous secondary, SQL Server by default will stop replicating data from the new Primary to the other Secondary replicas: this is done to eventually protect new Secondary replicas from data loss. You have to explicitly resume data movement using contextual menu of each Secondary replica as shown in the picture below:

 

That’s all I wanted to share with you, hope you will find this content useful and interesting, feel free to post your comments and feedbacks here. Remember that you can also follow me on Twitter ( @igorpag). Best Regards.

ScriptSample(3).ps1

Comments

  • Anonymous
    February 24, 2016
    Hi Igor, This guide makes no mention of adding endpoints into the ILB.  Is that not necessary?  We are trying to get an Always On cluster working but we can only reach the listener from the node that is is the same subnet as the listener and I'm wondering if the cause is there are no endpoints configured. Thanks!

  • Anonymous
    February 24, 2016
    Hi Clayton, in ARM there is no more concept of "endpoint", the equivalent is the load balancer rule you can see for the variable "$lbrule1" in the initial PowerShell/Image in this post. Once defined that, you will create SQL Server VMs and you will insert their NICs into the BackendAddressPools of the created ILB and that's done. Regarding this last part, I just added an additional piece to this post to make it more clear, see "# Create NICs: #" section. Hope this will help you, regards.

  • Anonymous
    March 30, 2016
    The comment has been removed

  • Anonymous
    May 20, 2016
    How do you manage the ProbePort when there are multiple AOAG?

  • Anonymous
    June 07, 2016
    Dear Igor,You mention that there is a restriction in ARM that will permit only one ILB, then one internal AG listener, per “Availability Set” but that might change in future. Any news on that? I have a scenario that requires it.RegardsPete

    • Anonymous
      June 07, 2016
      Hi Peter, what you asked is coming, please stay tuned and check again in July.Regards.
      • Anonymous
        June 07, 2016
        Thanks for the reply, much appreciated.
        • Anonymous
          October 18, 2016
          I think that is possible now. https://azure.microsoft.com/en-us/blog/azure-networking-announcements-for-ignite-2016/"We are announcing general availability of multiple VIPs on internal load balancers and new port reuse options across public and internal load balancers. In the following week, we will be previewing two additional abilities in specific regions: Multiple IP addresses on a Network Interface Card (NIC) and enabling all NICs on a VM to have a Public IP address on the NIC or through the load balancer. "
          • Anonymous
            October 19, 2016
            Not yet.... but soon...Tested last week and feature is still to be activated.Regards. ERROR: Set-AzureRmNetworkInterface : Subscription 8e95e0bb-d7cc-4454-9443-75ca862d34c1 is not registered for feature Microsoft.Network/AllowMultipleIpConfigurationsPerNic required to carry out the requested operation. StatusCode: 400 ReasonPhrase: Bad Request
  • Anonymous
    July 25, 2016
    In the section: Creating Virtual Machine NICsThere is a reference to a variable $ILBbut I couldn't find a definition for this variable. When I tried to run this part, it bombed

    • Anonymous
      July 25, 2016
      The comment has been removed
      • Anonymous
        July 25, 2016
        Thanks!Forgot to mention that the same two commands also refer to $dns1,$dns2, which are not assigned either.
        • Anonymous
          July 25, 2016
          Easily fixed, thanks.
          • Anonymous
            July 25, 2016
            Cool! Next problem here:$StorageAccount = Get-AzureRmStorageAccount -ResourceGroupName $rgname -Name $storageacccountname0What does in storageaccountname0?
          • Anonymous
            July 25, 2016
            Simply a name for the storage account that will be created.
  • Anonymous
    July 26, 2016
    OK -- now what I'd really like to know is this:I set up 4 VMs in Azure: a DC, 2 SQLs, and a Win10 for a client. I made an AG between the 2 SQLs. Failover works fine. The listener doesn't. I suppose I need to add an ILB to make that work. Can I add an ILB to this setup or do I need to start over?If I can add an ILB, how/what parts of your scripts do I need to run?

    • Anonymous
      July 26, 2016
      You need ILB if you want to connect to the same network name even after a failover. Never tried before but I'm pretty sure you can re-read the article from the beginning and think about each action: if you have done already, you can skip. Maybe some small logic changes would be required to get an object instead of create, but again I'm comfortable that you can complete your scenario without restarting from scratch. Regards.
      • Anonymous
        July 26, 2016
        The comment has been removed
        • Anonymous
          July 27, 2016
          Which OS you have?Try to run simply "Get-ClusterResource $IPResourceName" and you will see the properties if that property exist.Regards.
          • Anonymous
            July 27, 2016
            I'm running Windows Server 2012 R2I see that the properties your script is trying to set do not in fact exist:PS C:\Users\gbritton.GREENCHAIR> Get-ClusterResource $IPResourceName |get-member TypeName: Microsoft.FailoverClusters.PowerShell.ClusterResourceName MemberType Definition---- ---------- ----------Equals Method bool Equals(System.Object obj)GetHashCode Method int GetHashCode()GetType Method type GetType()ToString Method string ToString()Update Method void Update()Characteristics Property Microsoft.FailoverClusters.PowerShell.ClusterResourceAndResourceTypeCharacteristi...Cluster Property Microsoft.FailoverClusters.PowerShell.Cluster Cluster {get;}DeadlockTimeout Property uint32 DeadlockTimeout {get;set;}Description Property string Description {get;set;}EmbeddedFailureAction Property uint32 EmbeddedFailureAction {get;set;}Id Property string Id {get;}IsAlivePollInterval Property uint32 IsAlivePollInterval {get;set;}IsCoreResource Property bool IsCoreResource {get;}IsNetworkClassResource Property bool IsNetworkClassResource {get;}IsStorageClassResource Property bool IsStorageClassResource {get;}LastOperationStatusCode Property uint64 LastOperationStatusCode {get;}LooksAlivePollInterval Property uint32 LooksAlivePollInterval {get;set;}MaintenanceMode Property bool MaintenanceMode {get;}MonitorProcessId Property uint32 MonitorProcessId {get;}Name Property string Name {get;set;}OwnerGroup Property Microsoft.FailoverClusters.PowerShell.ClusterGroup OwnerGroup {get;}OwnerNode Property Microsoft.FailoverClusters.PowerShell.ClusterNode OwnerNode {get;}PendingTimeout Property uint32 PendingTimeout {get;set;}PersistentState Property uint32 PersistentState {get;set;}ResourceSpecificData1 Property uint64 ResourceSpecificData1 {get;}ResourceSpecificData2 Property uint64 ResourceSpecificData2 {get;}ResourceSpecificStatus Property string ResourceSpecificStatus {get;set;}ResourceType Property Microsoft.FailoverClusters.PowerShell.ClusterResourceType ResourceType {get;}RestartAction Property uint32 RestartAction {get;set;}RestartDelay Property uint32 RestartDelay {get;set;}RestartPeriod Property uint32 RestartPeriod {get;set;}RestartThreshold Property uint32 RestartThreshold {get;set;}RetryPeriodOnFailure Property uint32 RetryPeriodOnFailure {get;set;}SeparateMonitor Property bool SeparateMonitor {get;set;}State Property Microsoft.FailoverClusters.PowerShell.ClusterResourceState State {get;}StatusInformation Property Microsoft.FailoverClusters.PowerShell.ClusterResourceStatusInformation StatusInfo...PS C:\Users\gbritton.GREENCHAIR>
          • Anonymous
            July 27, 2016
            There is something wrong in your Windows Server 2012 R2 installation, the property exists and is correct, see the link below: https://technet.microsoft.com/en-us/library/hh847319(v=wps.630).aspx Look for EXAMPLE2, it is exactly related to HostTTL: PS C:&gt; Get-ClusterResource –Name cluster1FS | Get-ClusterParameter –Name HostRecordTTLI would suggest to involve a Support Specialist to give a look, I cannot help you more than that.Anyway, that command is not so vital, you should be able to complete the installation and have the environment working correctly.Regards.
        • Anonymous
          July 27, 2016
          look at the article https://technet.microsoft.com/en-us/library/ee460984.aspx , syntax is correct....
          • Anonymous
            July 27, 2016
            The comment has been removed
        • Anonymous
          October 24, 2016
          You find those values in the registry as well. They are under HKEY_LOCAL_MACHINE\Cluster\Resources\AGResourceGUID\Parameters