Deploy clustered applications on Azure Elastic SAN

Azure Elastic SAN volumes can be simultaneously attached to multiple compute clients, allowing you to deploy or migrate cluster applications to Azure. You need to use a cluster manager to share an Elastic SAN volume, like Windows Server Failover Cluster (WSFC), or Pacemaker. The cluster manager handles cluster node communications and write locking. Elastic SAN doesn't natively offer a fully managed filesystem that can be accessed over SMB or NFS.

When used as a shared volume, elastic SAN volumes can be shared across availability zones or regions. Sharing a volume in a local-redundant storage SAN across zones reduces your performance due to increased latency between the volume and clients.

Limitations

  • Elastic SAN connection scripts can be used to attach shared volumes to virtual machines in Virtual Machine Scale Sets or virtual machines in Availability Sets. Fault domain alignment isn't supported.
  • The maximum number of sessions a shared volume supports is 128.
    • An individual client can create multiple sessions to an individual volume for increased performance. For example, if you create 32 sessions on each of your clients, only four clients could connect to a single volume.

See Support for Azure Storage features for other limitations of Elastic SAN.

How it works

Elastic SAN shared volumes use SCSI-3 Persistent Reservations to allow initiators (clients) to control access to a shared elastic SAN volume. This protocol enables an initiator to reserve access to an elastic SAN volume, limit write (or read) access by other initiators, and persist the reservation on a volume beyond the lifetime of a session by default.

SCSI-3 PR has a pivotal role in maintaining data consistency and integrity within shared volumes in cluster scenarios. Compute nodes in a cluster can read or write to their attached elastic SAN volumes based on the reservation chosen by their cluster applications.

Persistent reservation flow

The following diagram illustrates a sample 2-node clustered database application that uses SCSI-3 PR to enable failover from one node to the other.

Diagram that shows clustered application.

The flow is as follows:

  1. The clustered application running on both Azure VM1 and VM2 registers its intent to read or write to the elastic SAN volume.
  2. The application instance on VM1 then takes an exclusive reservation to write to the volume.
  3. This reservation is enforced on your volume and the database can now exclusively write to the volume. Any writes from the application instance on VM2 fail.
  4. If the application instance on VM1 goes down, the instance on VM2 can initiate a database failover and take over control of the volume.
  5. This reservation is now enforced on the volume, and it won't accept writes from VM1. It only accepts writes from VM2.
  6. The clustered application can complete the database failover and serve requests from VM2.

The following diagram illustrates another common clustered workload consisting of multiple nodes reading data from an elastic SAN volume for running parallel processes, such as training of machine learning models.

Diagram that shows a machine learning cluster.

The flow is as follows:

  1. The clustered application running on all VMs registers its intent to read or write to the elastic SAN volume.
  2. The application instance on VM1 takes an exclusive reservation to write to the volume while opening up reads to the volume from other VMs.
  3. This reservation is enforced on the volume.
  4. All nodes in the cluster can now read from the volume. Only one node writes back results to the volume, on behalf of all nodes in the cluster.

Supported SCSI PR commands

The following commands are supported with Elastic SAN volumes:

To interact with the volume, start with the appropriate persistent reservation action:

  • PR_REGISTER_KEY
  • PR_REGISTER_AND_IGNORE
  • PR_GET_CONFIGURATION
  • PR_RESERVE
  • PR_PREEMPT_RESERVATION
  • PR_CLEAR_RESERVATION
  • PR_RELEASE_RESERVATION

When using PR_RESERVE, PR_PREEMPT_RESERVATION, or PR_RELEASE_RESERVATION, provide one of the following persistent reservation type:

  • PR_NONE
  • PR_WRITE_EXCLUSIVE
  • PR_EXCLUSIVE_ACCESS
  • PR_WRITE_EXCLUSIVE_REGISTRANTS_ONLY
  • PR_EXCLUSIVE_ACCESS_REGISTRANTS_ONLY
  • PR_WRITE_EXCLUSIVE_ALL_REGISTRANTS
  • PR_EXCLUSIVE_ACCESS_ALL_REGISTRANTS

Persistent reservation type determines access to the volume from each node in the cluster.

Persistent Reservation Type Reservation Holder Registered Others
NO RESERVATION N/A Read-Write Read-Write
WRITE EXCLUSIVE Read-Write Read-Only Read-Only
EXCLUSIVE ACCESS Read-Write No Access No Access
WRITE EXCLUSIVE - REGISTRANTS ONLY Read-Write Read-Write Read-Only
EXCLUSIVE ACCESS - REGISTRANTS ONLY Read-Write Read-Write No Access
WRITE EXCLUSIVE - ALL REGISTRANTS Read-Write Read-Write Read-Only
EXCLUSIVE ACCESS - ALL REGISTRANTS Read-Write Read-Write No Access

You also need to provide a persistent-reservation-key when using:

  • PR_RESERVE
  • PR_REGISTER_AND_IGNORE
  • PR_REGISTER_KEY
  • PR_PREEMPT_RESERVATION
  • PR_CLEAR_RESERVATION
  • PR_RELEASE-RESERVATION.