Previous version of Updated file in Azure Datalake with Hierarchical namespace (Container)

Vaibhav Jain 0 Reputation points
2024-06-19T14:47:52.3766667+00:00

How can we get previous version of uploaded/modified files when my storage is Datalake with Hierarchical namespace enabled.

Could you please suggest how to get previous version either by Versioning or Snapshots or last option backup and restore?

All the articles says Versioning is only allowed with blob when Hierarchical namespace is disabled but we would like to keep the Hierarchical namespace enabled and store the files into Azure storage.

Now looking for the option to get previous version of modified files.

Any help would be appreciated.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,408 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Anand Prakash Yadav 7,700 Reputation points Microsoft Vendor
    2024-06-20T08:57:41.87+00:00

    Hello Vaibhav Jain,

    Thank you for posting your query here!

    When using Azure Data Lake Storage Gen2 with a hierarchical namespace enabled, you’re correct that blob versioning is not directly supported. However, there are alternative approaches you can consider to achieve similar functionality:

    1. Snapshots:

    While blob versioning isn’t available, you can create snapshots of your files in Azure Data Lake Storage Gen2. Snapshots allow you to capture the state of a file at a specific point in time. When you create a snapshot, it becomes read-only, preserving the file’s content as it was when the snapshot was taken.
    To create a snapshot, use the az storage blob snapshot command or the Azure Storage Explorer.

    2. Backup and Restore:

    Implement a backup strategy by periodically copying your data to another storage account or location. You can use tools like AzCopy (ensure you’re using the latest version, AzCopy v10) to create backups of your files. Or you can set up data pipelines to copy data regularly to a backup location.
    Regularly back up your data to a separate storage account or another Azure service (e.g., Azure Blob Storage with versioning enabled).

    3. Custom Versioning Solution:

    Although Azure Data Lake Storage Gen2 doesn’t natively support blob versioning, you can build a custom solution using metadata or additional storage. For example, maintain a separate table or log that tracks versions of files. When a file is modified, update the version information in your custom tracking system.
    This approach requires custom development but allows you to manage versions according to your specific requirements.

    Please note that enabling a hierarchical namespace is irreversible.

    Do let us know if you have any further queries. I’m happy to assist you further.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.