Introducing Geo-replication for Windows Azure Storage

We are excited to announce that we are now geo-replicating customer’s Windows Azure Blob and Table data, at no additional cost, between two locations hundreds of miles apart within the same region (i.e., between North and South US, between North and West Europe, and between East and Southeast Asia).  Geo-replication is provided for additional data durability in case of a major data center disaster.

Storing Data in Two Locations for Durability

With geo-replication, Windows Azure Storage now keeps your data durable in two locations. In both locations, Windows Azure Storage constantly maintains multiple healthy replicas of your data.

The location where you read, create, update, or delete data is referred to as the ‘primary’ location. The primary location exists in the region you choose at the time you create an account via the Azure Portal (e.g., North Central US). The location where your data is geo-replicated is referred to as the secondary location. The secondary location is automatically determined based on the location of the primary; it is in the other data center that is in the same region as the primary. In this example, the secondary would be located in South Central US (see table below for full listing). The primary location is currently displayed in the Azure Portal, as shown below. In the future, the Azure Portal will be updated to show both the primary and secondary locations. To view the primary location for your storage account in the Azure Portal, click on the account of interest; the primary region will be displayed on the lower right side under Country/Region, as highlighted below.

portalaccountprimaryregion

The following table shows the primary and secondary location pairings:

Primary

Secondary

North Central US

South Central US

South Central US

North Central US

East US

West US

West US

East US

North Europe

West Europe

West Europe

North Europe

South East Asia

East Asia

East Asia

South East Asia

Geo-Replication Costs and Disabling Geo-Replication

Geo-replication is included in current pricing for Azure Storage.  This is called Geo Redundant Storage.

If you do not want your data geo-replicated you can disable geo-replication for your account. This is called Locally Redundant Storage, and is a 23% to 34% discounted price (depending on how much data is stored) over geo-replicated storage.   See here for more details on Locally Redundant Storage (LRS).

When you turn geo-replication off, the data will be deleted from the secondary location. If you decide to turn geo-replication on again after you have turned it off, there is a re-bootstrap egress bandwidth charge (based on the data transfer rates) for copying your existing data from the primary to the secondary location to kick start geo-replication for the storage account. This charge will be applied only when you turn geo-replication on after you have turned it off. There is no additional charge for continuing geo-replication after the re-bootstrap is done.

Currently all storage accounts are bootstrapped and in geo-replication mode between primary and secondary storage locations.

How Geo-Replication Works

When you create, update, or delete data to your storage account, the transaction is fully replicated on three different storage nodes across three fault domains and upgrade domains inside the primary location, then success is returned back to the client. Then, in the background, the primary location asynchronously replicates the recently committed transaction to the secondary location. That transaction is then made durable by fully replicating it across three different storage nodes in different fault and upgrade domains at the secondary location. Because the updates are asynchronously geo-replicated, there is no change in existing performance for your storage account.

Our goal is to keep the data durable at both the primary and secondary location. This means we keep enough replicas in both locations to ensure that each location can recover by itself from common failures (e.g., disk, node, rack, TOR failing), without having to talk to the other location. The two locations only have to talk to each other to geo-replicate the recent updates to storage accounts. They do not have to talk to each other to recover data due to common failures. This is important, because it means that if we had to failover a storage account from the primary to the secondary, then all the data that had been committed to the secondary location via geo-replication will already be durable there.

With this first release of geo-replication, we do not provide an SLA for how long it will take to asynchronously geo-replicate the data, though transactions are typically geo-replicated within a few minutes after they have been committed in the primary location.

How Geo-Failover Works

In the event of a major disaster that affects the primary location, we will first try to restore the primary location. Dependent upon the nature of the disaster and its impacts, in some rare occasions, we may not be able to restore the primary location, and we would need to perform a geo-failover. When this happens, affected customers will be notified via their subscription contact information (we are investigating more programmatic ways to perform this notification). As part of the failover, the customer’s “account.service.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location. Once this DNS change is propagated, the existing Blob and Table URIs will work. This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover.

For example, if the primary location for a storage account “myaccount” was North Central US, then the DNS entry for myaccount.<service>.core.windows.net would direct traffic to North Central US. If a geo-failover became necessary, the DNS entry for myaccount.<service>.core.windows.net would be updated so that it would then direct all traffic for the storage account to South Central US.

After the failover occurs, the location that is accepting traffic is considered the new primary location for the storage account. This location will remain as the primary location unless another geo-failover was to occur. Once the new primary is up and accepting traffic, we will bootstrap a new secondary, which will also be in the same region, for the failed over storage accounts. In the future we plan to support the ability for customers to choose their secondary location (when we have more than two data centers in a given region), as well as the ability to swap their primary and secondary locations for a storage account.

Order of Geo-Replication and Transaction Consistency

Geo-replication ensures that all the data within a PartitionKey is committed in the same order at the secondary location as at the primary location. This said, it is also important to note that there are no geo-replication ordering guarantees across partitions. This means that different partitions can be geo-replicating at different speeds. However, once all the updates have been geo-replicated and committed at the secondary location, the secondary location will have the exact same state as the primary location. However, because geo-replication is asynchronous, recent updates can be lost in the event of a major disaster.

For example, consider the case where we have two blobs, foo and bar, in our storage account (for blobs, the complete blob name is the PartitionKey).  Now say we execute transactions A and B on blob foo, and then execute transactions X and Y against blob bar.  It is guaranteed that transaction A will be geo-replicated before transaction B, and that transaction X will be geo-replicated before transaction Y.  However, no other guarantees are made about the respective timings of geo-replication between the transactions against foo and the transactions against bar. If a disaster happened and caused recent transactions to not get geo-replicated, that would make it possible for, transactions A and X to be geo-replicated, while losing transactions B and Y. Or transactions A and B could have been geo-replicated, but neither X nor Y had made it. The same holds true for operations involving Tables, except that the partitions are determined by the application defined PartitionKey of the entity instead of the blob name. For more information on partition keys, please see Windows Azure Storage Abstractions and their Scalability Targets.

Because of this, to best leverage geo-replication, one best practice is to avoid cross-PartitionKey relationships whenever possible. This means you should try to restrict relationships for Tables to entities that have the same PartitionKey value. Since all transactions within a single partition are geo-replicated in order, this guarantees those relationships will be committed in order on the secondary.

The only multiple object transaction supported by Windows Azure Storage is Entity Group Transactions for Windows Azure Tables, which allow clients to commit a batch of entities together as a single atomic transaction. Geo-replication also treats this batch as an atomic operation. Therefore, the whole batch transaction is committed atomically on the secondary.

Summary

This is our first step in geo-replication, where we are now providing additional durability in case of a major data center disaster. The next steps involve developing features needed to help applications recover after a failover, which is an area we are investigating further.

Brad Calder and Monilee Atkinson

Comments

  • Anonymous
    September 16, 2011
    awesome post..I really like the such type of post....Keep it up <a href="www.boxmypad.com/"> college packing checklist</a>

  • Anonymous
    September 18, 2011
    Its a nice feature though a few questions

  1. After failover to secondary storage when would it come back to primary storage again? Because the secondary storage (which is new primary in case of failover would increase latency) 2.Are you planning to provide this with API support to be used in Azure Traffic Manager? Keep it up -Sachin Sachin at cumulux dot com
  • Anonymous
    September 18, 2011
    The comment has been removed
  • Anonymous
    September 19, 2011
    Hi Great News, thanks for that. i have 3 questions
  1. just to have more understanding - i made action to main table, it replicate to 3 replications, then to other 3 replication in the Secondary locations? so we have 6 places were our data exists?
  2. when when when we will have it in SQL Azure?
  3. do we have GUI to that Secondary location geo-replicating ? thanks for the efforts pini
  • Anonymous
    September 19, 2011
    Hi Pini Here are some answers to your questions:
  1. Correct, we keep multiple copies of your data in each of the 2 locations.  This allows us to recover from common hardware failures within the same data center, so if there is a failover we have a durable copy of all of the data that was committed to the secondary ready to be used.  
  2. Don’t have a roadmap for SQL Azure.   Possibly ask the question here: social.msdn.microsoft.com/.../threads
  3. There isn’t any at this time, but that is a feature request we have. Thanks Brad
  • Anonymous
    February 16, 2012
    The comment has been removed

  • Anonymous
    February 18, 2012
    Hi Laxmikant For (1), we will look at doing that in the future when we have more than two data centers in a given region. For (2), with asynchronous geo-replication there is no way to do that if there is an unplanned failover (major data center disaster), since the whole focus on async is to commit the data quickly on the primary, ack back to the client, and then asynchronously geo-replicate the changes to the secondary.   We have had feature requests for providing synchronous geo-rep (RPO of 0), and then yes that would ensure that the update is on both the primary and secondary before returning success back to the client.     What some customers do to achieve that level of consistency (between the primary and secondary)right now is to perform their own form of geo-rep at their application level, where in their application level logic they store the data in two locations and maintain consistency between them at the application level. Thanks Brad

  • Anonymous
    June 29, 2012
    Hi Brad Thanks for this blog post. So, tables and blobs are replicated, but what about queues? Thank you

  • Roland
  • Anonymous
    July 02, 2012
    Hi Roland We geo-replicate all blob and table data, but not queues.  In the future we will be adding geo-replication for queues, but we do not have a timeline as to when that will happen. Thanks Brad

  • Anonymous
    September 26, 2012
    The comment has been removed

  • Anonymous
    September 28, 2012
    Hi Raghu, For your question #1, you have several choices. One option is to use Windows Azure Storage's geo-replication on the disks and/or drives.  A possible drawback of this approach is that if you have multiple disks attached to a single VM, the replicated state of these disks might not be totally in-sync.  To work around this, you need some sort of transaction semantics so that you can get the disks/drives to a known synchronization point after a fail over.  One possible method to get this kind of synchronization is to use Windows’ Volume Shadow Copy Service (for more information, see msdn.microsoft.com/.../aa384649(v=vs.85).aspx). Another option is to build your own geo replication, using "log shipping" or "mirroring" functionality.  In this case, you can either turn geo-replication off in Windows Azure Storage and send the data between VMs yourself, or write your log-shipping logs onto the disk/drive, and then replay them as needed after a geo-failover. Thanks Andrew Edwards Windows Azure Storage

  • Anonymous
    October 26, 2012
    Does this support active-active replication? Can users configure unidirectional and bidirectional replication?

  • Anonymous
    October 27, 2012
    The comment has been removed

  • Anonymous
    May 01, 2014
    I have a quick question about geo-failover - what are the conditions for invoking geo-failover? What sort of suggestions does your team have for implementing in-house geo-failover (in terms of repointing resources to another store and replicating that store).  I'd be really interested in how other Azure users are addressing this kind of scenario.

  • Anonymous
    May 02, 2014
    @Alistair    I have a quick question about geo-failover - what are the conditions for invoking geo-failover? When the storage service in your primary location becomes unavailable and it cannot be restored, we will perform a geo-failover. Events that may cause this unrecoverable damage to the primary location are usually natural disasters like earthquake or other events that may destroy a data center. Any other issue that causes unavailability but we can recover from, will not lead to a geo-failover. Providing our users the ability to control geo-failover is in the list of feature requests but we do not have a timeline to share.    What sort of suggestions does your team have for implementing in-house geo-failover (in terms of repointing resources to another store and replicating that store).  I'd be really interested in how other Azure users are addressing this kind of scenario. There are couple of options: 1> Via RA-GRS: You can enable RA-GRS (see blogs.msdn.com/.../introducing-read-access-geo-replicated-storage-ra-grs-for-windows-azure-storage.aspx for details) and then when primary location is unavailable, you can use the secondary location to serve reads. If you need writes to succeed, then in parallel you can  copy the data from this secondary to a new storage account so that you can start serving writes too. 2> Queuing writes to multiple locations: You can queue your writes to copy it to multiple locations so that you can maintain active-active. However, there are nuances around ensuring eventual consistency (example: what if write fails to primary  etc.). Thanks, Jai