AzCopy log file for MD5 mismatch

Zeno GH 25 Reputation points
2024-05-30T14:19:17.31+00:00

Hi, I'm using AzCopy version 10.17.0 to download files from storage account. I'm using --put-md5 and --check-md5 to validate the file integrity. However, on download, if there is an MD5 mismatch for one of the blob then how that will be logged in the AzCopy.log file?

AzCopy log has 3 categories as per the document - UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED

So an MD5 mismatch will come under download failed? Assume that the file is downloaded but some data is lost while writing it to the disk.

Regards

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,149 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 10,491 Reputation points
    2024-05-30T14:36:07.6866667+00:00

    Hello Zeno GH,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you seek to understand how an MD5 mismatch during the download process will be logged in the AzCopy log file, specifically asking whether it will be categorized under DOWNLOADFAILED and what the log entry would look like in case of such a mismatch.

    Yes, in AzCopy, if there is an MD5 mismatch during the download process, it will be logged under the DOWNLOADFAILED category in the AzCopy log file. This is because an MD5 mismatch indicates that the file's integrity has been compromised during the download process, which is effectively a failure in the download operation.

    The below is what the log entry might look like for an MD5 mismatch:

    {
      "time": "2024-05-30T12:34:56.789Z",
      "level": "ERROR",
      "msg": "DOWNLOADFAILED",
      "source": "https://mystorageaccount.blob.core.windows.net/mycontainer/myblob",
      "destination": "/local/path/to/myblob",
      "error": {
        "code": "MD5Mismatch",
        "message": "The MD5 hash of the downloaded file does not match the expected hash."
      }
    }
    

    Also, you can use a script to scan AzCopy log files for MD5 mismatches. The example provided here is written in Python:

    import json
    import os
    log_directory = "<path-to-log-directory>"
    def check_md5_mismatches(log_directory):
        for filename in os.listdir(log_directory):
            if filename.endswith(".log"):
                with open(os.path.join(log_directory, filename), 'r') as log_file:
                    for line in log_file:
                        log_entry = json.loads(line)
                        if log_entry.get("msg") == "DOWNLOADFAILED" and log_entry["error"].get("code") == "MD5Mismatch":
                            print(f"MD5 mismatch found in log {filename}:")
                            print(json.dumps(log_entry, indent=4))
    # Run the function
    check_md5_mismatches(log_directory)
    

    Therefore, by using AzCopy with --put-md5 and --check-md5, you can ensure file integrity during downloads. If an MD5 mismatch occurs, it will be logged under the DOWNLOADFAILED category. Monitoring these logs and implementing steps to verify and resolve MD5 mismatches can help maintain data integrity. The provided Python script can help automate the detection of such mismatches in AzCopy logs.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.