How do we ensure data consistency with the results of azcopy

VJ-8370 446 Reputation points
2020-08-05T10:39:21.197+00:00

Hi Team,

Can anyone confirm how do we ensure data consistency with the results of azcopy. How do we ensure that data has copied without any issues (without opening any file, as there would be thousands of files)?

Regards,
VJ

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,871 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sumarigo-MSFT 44,906 Reputation points Microsoft Employee
    2020-08-05T12:42:59.003+00:00

    @VJ-8370 Adding more information: Azcopy will copy all the data (upload and download). when you start the process.
    There won’t any issue with the consistency!.

    AzCopy creates log and plan files for every job. You can use the logs to investigate and troubleshoot any potential problems.
    The logs will contain the status of failure (UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED), the full path, and the reason of the failure.
    By default, the log and plan files are located in the %USERPROFILE.azcopy directory on Windows or $HOME.azcopy directory on Mac and Linux.

    Note: When you resume a job, AzCopy looks at the job plan file. The plan file lists all the files that were identified for processing when the job was first created. When you resume a job, AzCopy will attempt to transfer all of the files that are listed in the plan file which weren't already transferred.

    Data transfers are done with spare bandwidth and that there is no SLA as to whether it'll be fast or slow.

    Will 'sync' delete files in the destination if they no longer exist in the source location?
    By default, the 'sync' command doesn't delete files in the destination unless you use an optional flag with the command. To learn more, see Synchronize files.

    Regarding partial data, see my comment above (in the thread Purge Bad Blocks). AzCopy is designed to delete partial data after failures IF it has deletion rights and IF the failure is a controlled one (i.e. something where AzCopy can detect it and perform cleanup in response). Sudden termination of the AzCopy process itself would not be a controlled failure, and so it would not be able to cleanup partially-completed destination files in that case.

    E.g. if you kill the process from Task Manager. (BTW, block blobs are a special case, with stricter behavior around partial data, because PutBlock and PutBlockList are atomic. Therefore on block blobs you won't see incomplete data even after an uncontrolled failure. Whereas on page blobs and Azure Files you can see incomplete data after an uncontrolled failure because they are not atomically updated).

    Additional information :To validate the data integrity, you have to download the file using AzCopy with /CheckMD5 option, and then compare the downloaded file with your local original file.

    However, given AzCopy has made its best effort to protect data integrity during transferring, the validation step above is probably redundant and not recommended unless data integrity is much more important than performance

    Hope this helps!

    Kindly let us know if the above helps or you need further assistance on this issue.

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Please don’t forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Manu Philip 17,186 Reputation points MVP
    2020-08-05T10:54:22.29+00:00

    Hi,
    Append the --put-md5 flag to each copy command to make sure the data consistency. Please refer: storage-use-azcopy-files

    0 comments No comments