A Script for Replicating Database Backup Files to the Azure Cloud
Let's say we have a backup process that creates backup files on a file server. For disaster recovery purposes, I need to get these files offsite in a timely manner. I have decent bandwidth out of my data centers so that I have decided to put these files into low-cost Azure storage.
NOTE This script is provided for educational purposes only. No warranties or guarantees are provided.
This PowerShell script does just that. It recursively loops through target directory and copies files matching a particular naming pattern which are flagged for Archival to Azure Blob Storage. The copy is handled through an asynchronous job so that I might push multiple files simultaneously. A variable in the master script controls how many asynchronous jobs can run at one time and a variable in the secondary script controls the number of threads which each job can employ. (To tune the routine for my machine, I first copy a large backup file using various numbers of threads to see which provides me the best individual file throughput. I then vary the number of jobs to maximize the overall throughput of the routine.)
I am using Azure Blob Storage as it is the cheapest option available in Azure. That said, I need to keep in mind that Azure Blob Storage limits individual block blobs to 200 GB max. I will want to ensure that my backups are compressed for bandwidth purposes and that any files approaching 200 GB are split into multiple backups. (If I need to go bigger I could use page blobs or even Azure Files but these are more expensive and I need cheap.)
A few final thoughts:
- If I were to make use of this file for production purposes, I would want to improve add logging and error handling. I might also take a look at using AZCopy instead of the PowerShell commandlets in the secondary job, just because AZCopy has better job recovery features.
- As this is for DR, I might not wish to put every backup into the cloud. Depending on my tolerance for data loss in a DR scenario, I might only put certain types of backups in the cloud with a particular frequency. Not everything needs to be offsite in every scenario.
- Over time, my backup files become less and less valuable. If I were to productionize this routine, I'd probably have another routine that crawls the Azure Storage Account and prunes files meeting some kind of aging criteria.
- In order to recover from a backup that is in the cloud, I would need a script to download the backup files to local storage. I'd probably have one that allowed me to pull down one or more files meeting various criteria as DR may or may not involve pulling down all files for all platforms. There is a starting point for this as a third script sample in the download reference above.