AzCopy – Announcing General Availability of AzCopy 3.0 plus preview release of AzCopy 4.0 with Table and File support

We are pleased to announce that AzCopy is now GA.  

Starting from this release, we will publish two AzCopy series, the RTM series that includes only the GA features and the Pre-release series that includes both the GA and the preview features.

You can download either the AzCopy 3.0.0 with blob copy functionality only, or the AzCopy 4.0.0-preview which includes the GA features and the additional Storage Table Entities copy feature that’s under preview.

AzCopy 3.0.0 - General Available

AzCopy GA version 3.0.0 includes following changes:

  • AzCopy now requires that the end user explicitly specify every parameter’s name. In the previous releases, the source, destination and file pattern parameters do not require any parameter names. Starting from 3.0.0, the command line ‘AzCopy <source> <dest> [pattern] [options]’ needs to be changed to:

AzCopy /Source:<source> /Dest:<destination> /Pattern:<pattern> [Options] …

As a result of this change, it is no longer required that parameters like source and destination follow any specified order.

  • We have also made the following changes to the AzCopy command line’s help messages:
    • Type ‘AzCopy’ to get short version’s help.
    • Type ‘AzCopy /?’ to get detailed command line help
    • Type ‘AzCopy /?:Sample’ to get command line samples.
    • Type ‘AzCopy /?:<option name>’ to get detailed help for the named AzCopy option, e.g.

          AzCopy /?:SourceKey

  • In previous version of AzCopy, if user chooses NOT to overwrite existing files or blobs, AzCopy will assign ‘failed’ status for those files or blobs that already exist. From 3.0.0, AzCopy will assign ‘skipped’ status for such files and display ‘Transfer skipped: <Total skipped count>’ as part of ‘Transfer summary’ in the console window.

 

 

 

 

AzCopy 4.0.0-preview - Copy Azure Storage Table Entities (New Preview)

Besides copying blobs and Azure Files, AzCopy 4.0.0-preview will also support exporting table entities to local files or to azure storage block blobs, and importing the data back to a storage table. Note that this is not a consistent snapshot of the table since changes may occur to entities in a table at various times before AzCopy completes retrieving all the entities.

  • When exporting table entities, user can specify the parameter /Dest with a local folder or blob containers, e.g.

AzCopy /Source:https://myaccount.table.core.windows.net/myTable/ /Dest:D:\test\ /SourceKey:key

AzCopy /Source:https://myaccount.table.core.windows.net/myTable/ /Dest:https://myaccount.blob.core.windows.net/mycontainer/ /SourceKey:key1 /Destkey:key2

AzCopy will generate JSON data files in the local folder or blob container with the following naming convention:

<account name>_<table name>_<timestamp>_<volume index>_<CRC>.json

  • AzCopy will by default generate one JSON data file, user can specify /SplitSize:<split file size in MB> to generate multiple data files, e.g.

AzCopy /Source:https://myaccount.table.core.windows.net/myTable/ /Dest:D:\test\ /SourceKey:key /SplitSize:100

AzCopy uses ‘volume index’ in the data files’ name to distinguish multiple files. ‘Volume index’ contains two parts, ‘partition key range index’ and ‘split file index’ (both starting from 0). The ‘partition key range index’ will be 0 if user does not specify the option /PKRS, which will be introduced in the next section.

For instance, AzCopy generates two data files after the user specifies the option /SplitSize, the data files’ name may look like the following:

    myaccount_mytable_20140903T051850.8128447Z_0_0_C3040FE8.json
  myaccount_mytable_20140903T051850.8128447Z_0_1_0AB9AC20.json

Note that the minimum value of split size is 32MB, and if the destination is blob storage, AzCopy will split the data file once the file size reaches the blob size limit (200GB) even though the option /SplitSize is not specified by end user.

  • AzCopy by default exports the whole table’s entities in a serial fashion. To start concurrent exporting, user needs to specify the option /PKRS:<partition key range split>. Use this option with caution since Azure Table Service is a key lookup store and is not built for efficient scans. Too many scans on a table can lead to throttling of live traffic.

For instance, when the option /PKRS:”aa#bb” is specified, AzCopy will start three concurrent operations to export three partition key ranges below:

[<first partition key>, aa)
[aa, bb)
[bb, <last partition key>]

AzCopy /Source:https://myaccount.table.core.windows.net/myTable/ /Dest:D:\test\ /SourceKey:key /PKRS:”aa#bb”

And the generated JSON data files may looks like this:

myaccount_mytable_20140903T051850.8128447Z_0_0_C3040FE8.json
myaccount_mytable_20140903T051850.8128447Z_1_0_0AB9AC20.json
myaccount_mytable_20140903T051850.8128447Z_2_0_939AF48C.json

Note that the number of concurrent operations is also controlled by the option /NC, AzCopy uses the number of cores on the machine as the default value of /NC when copying table entities. When user specifies the option /PKRS, AzCopy will choose the smaller of the two values, number of partition key ranges or the value specified in the /NC, as the number of concurrent operations. Please find more details about /NC by input ‘AzCopy /?:NC’.

  • When importing the data file back to table, user needs to specify both the option /Manifest and /EntityOperation.
AzCopy /Source:D:\test\ /Dest:https://myaccount.table.core.windows.net/mytable1/ /DestKey:key /Manifest:"myaccount_mytable_20140103T112020.manifest" /EntityOperation:InsertOrReplace AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer/ /Dest:https://myaccount.table.core.windows.net/mytable1/ /SourceKey:key1 /DestKey:key2 /Manifest:"myaccount_mytable_20140103T112020.manifest" /EntityOperation:InsertOrReplace

The manifest file is generated in the destination local folder or the blob container when user exports table entities using AzCopy. The manifest file will be used to locate all the data files and to perform data validation during importing. The manifest file uses the following naming convention:

    <account name>_<table name>_<timestamp>.manifest

The option /EntityOperation is used to govern the behavior of entity importing:

    • InsertOrSkip - Skips an existing entity or inserts a new entity if it does not exist in the table.
    • InsertOrMerge - Merges an existing entity or inserts a new entity if it does not exist in the table.
    • InsertReplace - Replaces an existing entity or inserts a new entity if it does not exist in the table.

Note that option /PKRS cannot be used when importing entities. AzCopy will by default start concurrent operations in the import scenario, the default number of concurrent operations is equal to the number of cores of the machine, but user can change the number by specifying the option /NC. For more details, type ‘AzCopy /?:NC’.

As always, we are looking forward to your feedback.

Microsoft Azure Storage Team

Comments

  • Anonymous
    October 29, 2014
    It's a great tool. And some feedback for v4 preview: Can you add switch to skip checksum validation for the table data (one from .manifest file e.g. "Checksum":4214960409 ) - it's reasonable scenario to download small table (e.g. with configuration), change some values and re-upload it again. Current version will fail data verification if anything was changed in json. Or at least provide an algorithm how to calculate this checksum (doesn't look like md5 or sha1 or crc32 hex->dec) PS Are you going (eventually) to open-source it for community like some other Azure related tools ?

  • Anonymous
    October 29, 2014
    Hi Alexey, Thanks for your feedbacks, we've already put 'skip validation when importing table entities' and 'open source' into our backlog, but no concrete timeline to share yet. Regarding the data validation algorithm, the CRC in the data file’s name is the data file’s bit level CRC, and the CRC in the manifest file is a XOR of all the data files’ CRC. Zhiming

  • Anonymous
    December 02, 2014
    Thanks for sharing. I noticed the following in the license agreement "You may not ... publish the software for others to copy" -- clearly this must be a mistake? Unless you actually mean I'm not allowed to publish it on chocolatey.org or you intend to publish it yourself. Unfortunately if it's not on chocolatey I'm not going to provision it as part of my infrastructure.

  • Anonymous
    December 03, 2014
    I need to actually move blobs from one container to the other in batches of, say, 1000.  What parameters would I need to do that? Thanks, __Birm

  • Anonymous
    December 03, 2014
    The comment has been removed

  • Anonymous
    December 07, 2014
    When I saw the title (...File Support), I was hoping you added the ability to move files from Blob storage to an Azure File Share and vice versa. Any chance of this feature being added any time soon?

  • Anonymous
    December 07, 2014
    Hi Robin, Thanks for your feedback, we've already put the copying from File to Blob and from Blob to File into our feature list, which shall be released in the near future, but there is no concrete timelime to share yet. Zhiming

  • Anonymous
    December 12, 2014
    Hi, is it possible to use AzCopy to copy all the blobs in a container into a table? We want to shift millions of small json blobs into table storage for better lookup scalability. if it takes hours to do while system is running how do we make sure every blob was coppied and not missed some?

  • Anonymous
    December 14, 2014
    Hi Toby, You can use AzCopy to export table entities into blobs with json format and import them back to table, however, AzCopy does not include the support of copying random json blobs into tables yet, because currently AzCopy uses self-defined manifest file generated during the exporting to validate the importing data. Zhiming