how to transform all files in a folder and export as seperate files in one notebook

reddy 41 Reputation points
2020-08-18T03:02:22.36+00:00

i have a adls gen2 folder with multiple parquet files with same structure. i want to transform all files at once seperately with one script in same notebook and convert each file to csv and write to another folder in adls.
how can achieve this?

let's say 10 files in adls....i want to do this

adls gen 2 folder A ---> read and transform in one db notebook --> write output to folder B in adls
10 parquet files seperately(no merging) in csv format (10 csv files)

@AmanpreetSingh-MSFT @PRADEEPCHEEKATLA-MSFT @HarithaMaddi-MSFT

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,174 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,376 Reputation points Microsoft Employee
    2020-08-18T07:00:58.827+00:00

    Hello @reddy ,

    Here are the steps to convert Parquet files to csv format in a notebook:

    Parquet files in an Azure Data Lake Gen2 folder name azure:

    18234-image.png

    Step1: You can access the Azure Data Lake Gen2 storage account in databricks using any one of the methods from this document.

    I’m access ADLS gen2 folder using the storage account access key.

     spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",” storage-account-access-key-name>"))  
    

    Step2: Using Spark, you can convert Parquet files to CSV format as shown below.

    18256-image.png

    CSV files in an Azure Data Lake Gen2 folder name csv files:

    18246-image.png

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.