Rename multiple folder name in data lakes

Zhu, Yueli YZ [NC] 235 Reputation points
2023-12-28T18:49:09.8433333+00:00

Hi,

I have a typo in folder names when using adf to copy data to data lake gen2. Since the files are so big, I do not want to recopy the files. But there are so many folders names needed to be changed. Instead of doing it one by one manually, is there any way to rename the folder names at once? Following is the example:

output/a/2020:12.28/a.parquet

output/b/2020:12.28/b.parquet

output/c/2020:12.28/c.parquet

output/d/2020:12.28/d.parquet

How can I replace all the 2020:12.28 above to 2020:12:28 at once?

Thanks

Azure SQL Database
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,409 questions
Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
501 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,613 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,994 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Konstantinos Passadis 17,381 Reputation points MVP
    2023-12-28T19:37:59.4+00:00

    Hello @Zhu, Yueli YZ [NC] !

    I understand your issue but there is no native way of renaming since we are talking about Azure Storage which eventually serves the Datalake

    I would move the contents into new Folders-Directories with an automation Code :

    # Connect to Azure account
    Connect-AzAccount
    
    # Set the context to the appropriate subscription
    Set-AzContext -SubscriptionId "YourSubscriptionId"
    
    # Set the variables
    $resourceGroupName = "YourResourceGroupName"
    $storageAccountName = "YourStorageAccountName"
    $containerName = "YourContainerName"
    $oldDateString = "2020:12.28"
    $newDateString = "2020-12-28"
    
    # Get the context of the storage account
    $ctx = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -AccountName $storageAccountName).Context
    
    # List all blobs in the container and filter for the ones that need renaming
    $blobs = Get-AzStorageBlob -Container $containerName -Context $ctx | Where-Object { $_.Name -like "*$oldDateString*" }
    
    foreach ($blob in $blobs) {
        # Generate the new blob name
        $newBlobName = $blob.Name -replace $oldDateString, $newDateString
        
        # Copy the blob to the new name/location
        Start-AzStorageBlobCopy -SrcBlob $blob.Name -SrcContainer $containerName -DestBlob $newBlobName -DestContainer $containerName -Context $ctx
        
        # After confirming the copy is complete, remove the old blob
        # SOS # Remove-AzStorageBlob -Blob $blob.Name -Container $containerName -Context $ctx 
    }
    
    # Note: The script above doesn't delete the old blobs, it just copies them to the new location.
    # You should verify that all blobs are copied successfully before deleting the old ones.
    
    
    

    The Code runs and does the Job however i get an error so please try this out , i got the result and ignored the error

    Pay attention to the commented out line , DO NOT DELETE THE OLD BLOBS unless you are sure it does the Job so make some test ahead!


    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards