What mitigation strategies are implemented for zone-redundant storage if a zone becomes unavailable?

ML-1722 100 Reputation points
2024-05-26T11:20:29.65+00:00

Hi all,

Azure docs says that a zonal failover may involve networking updates such as DNS repointing. In which scenarios will DNS repointing be involved? As you know, application-level DNS cache may affect the effectiveness of DNS repointing. Are there alternative methods implemented as well, e.g. with a zone-redundant load balancer?

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,863 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Marcin Policht 16,420 Reputation points MVP
    2024-05-26T13:16:53.1866667+00:00

    As per https://video2.skills-academy.com/en-us/azure/storage/common/storage-redundancy#zone-redundant-storage (which I gather is what you're referring to)

    "When designing applications for ZRS, follow practices for transient fault handling, including implementing retry policies with exponential back-off."

    A write request to a storage account that is using ZRS happens synchronously. The write operation returns successfully only after the data is written to all replicas across the three availability zones. If an availability zone is temporarily unavailable, the operation returns successfully after the data is written to all available zones."

    In other words, a zone failure would need to be mitigated on the side of the application that performs a write (since that write operation might temporarily fail) - so that operation is retried afterwards


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


  2. Marcin Policht 16,420 Reputation points MVP
    2024-05-26T13:18:41.8633333+00:00

    Btw. load balancer would not support this functionality, since the zonal endpoints on the storage side are not exposed.


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments

  3. Anand Prakash Yadav 7,700 Reputation points Microsoft Vendor
    2024-05-28T08:45:47.1733333+00:00

    Hello ML-1722,

    Thank you for posting your query here!

    I understand that your question is regarding the scenarios in which DNS repointing might be involved, along with an explanation of other potential networking updates that Azure might undertake during a zone failure.

    Zone Failover Scenarios:

    · When an entire availability zone becomes unavailable (due to maintenance, hardware failure, or other issues), Azure initiates failover procedures.

    · Failover ensures that your services remain accessible by redirecting traffic to healthy zones.

    · DNS repointing is one of the steps taken during this process.

    Specific Scenarios for DNS Repointing:

    Zone Unavailability: If a zone fails, Azure updates DNS records to redirect traffic away from the affected zone.

    Storage Account Access: For storage accounts using ZRS, DNS repointing ensures that clients can still access the storage account even if one zone is down.

    Service Endpoints: DNS records are updated to point to the remaining available zones’ endpoints.

    Application-Level Handling: Applications should handle retries and re-establish connections after DNS updates.

    Other Networking Updates:

    While DNS repointing is common, other networking updates may also occur during failover:

    Routing Updates: Azure updates internal routing tables to route traffic away from the failed zone.

    Load Balancer Configuration: If you’re using a load balancer, its configuration may be adjusted to distribute traffic across healthy zones.

    Virtual Network Gateway Updates: For services like VPN gateways, Azure may reconfigure routes to avoid the failed zone.

    Application Considerations:

    Applications must be resilient to transient failures during DNS updates.

    Implement retry logic for any operations affected by DNS changes.

    Monitor DNS propagation and adjust TTL values to minimize caching.

    Please note that apart from DNS repointing, Azure orchestrates multiple actions to maintain availability during zone failures.

    Do let us know if you have any further queries. I’m happy to assist you further.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.