How to Transform files in subfolders with one script in databricks

reddy 41 Reputation points
2020-08-14T16:55:40.797+00:00

i have a adls gen2 folder with sub folders with parquet files in each folder. My requirement is to transform all parquet files in sub folders and load into another folder in adls gen 2 with same folder structure with one script. is it possible to do or do i need multiple scripts?

Can we do this in databricks or adf?
Please suggest..
Thanks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,410 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,047 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,026 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-17T11:59:49.603+00:00

    Hi @reddy ,

    Welcome to Microsoft Q&A Platform.

    This can be implemented in Azure databricks and Azure data factory as well. There is no straight forward approach in databricks(Need to use SDKs), hence data factory is feasible for this as per my understanding.

    Below is one approach in data factory: we can use "Get metadata" activity to get details of the files in the particular folder structure and they can be passed to foreach activity. Inside foreach activity, having dataflow will give us possibilities to apply many transformations as shown below. 17944-transformationdataflow.gif

    Hope this helps! Please let us know if our understanding is incorrect or for further queries and we will be glad to assist further.

    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members