Transforming JSON files using data flow

A K 20 Reputation points
2024-11-07T15:23:15.93+00:00

Hello!

I currently have about 60 json files inside a blob container which most of them have different fields and values.

I have created a pipeline with a get metadata activity that points to the container, with the field list set to Child items. I have also created a parameter within the source dataset called fileName and have set the value as item().name within the Get Metadata activity settings. I have then connected this to a for each activity with '@activity('Get Metadata1).output.childItems'. Inside the for each activity I have placed a data flow to remove the header and footer, and to flatten the nested json value.

I also have a data flow parameter filenameparam with the value of @item().name again.

The problem I am facing now is that it outputs the transformed data in multiple parts with system-generated names like the below screenshot and instead of the 60 json files, I now have 150+ files.partition

Can anyone please advise on how I can change the configuration settings so that the file names are the same as the original file and also that it outputs the 60 files with just the header/footer taken out and the value flattened without the partitioning.

Thank you in advance!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,483 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,833 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Keshavulu Dasari 1,080 Reputation points Microsoft Vendor
    2024-11-07T20:00:18.73+00:00

    Hi A K,
    Welcome to Microsoft Q&A Forum, thank you for posting your query here!
    The issue you’re encountering with the output files being split into multiple parts is likely due to the default partitioning behavior in your data flow. To ensure that your output files retain their original names and are not partitioned:

    1.Disable Partitioning in Data Flow:

    • In your data flow, go to the Sink transformation.
    • Under the Settings tab, look for the Partitioning section.
    • Set the Partitioning option to Single partition. This will ensure that the data is not split into multiple files.

    2.Set Output File Names:

    • In the Sink transformation, under the Settings tab, you can specify the output file name.
    • Use the data flow parameter filenameparam to set the output file name. You can do this by setting the Output to single file option and then using the expression @concat(filenameparam, '.json') to name the file.

    3.Ensure Consistent File Naming:

    • Make sure that the filenameparam is correctly passed from the ForEach activity to the Data Flow.
    • In the ForEach activity, ensure that the Items property is set to @activity('Get Metadata1').output.childItems.
    • Inside the ForEach activity, in the Data Flow activity, map the parameter filenameparam to @item().name.

    Brief steps:

    1. Disable Partitioning: Set the partitioning option to single partition in the Sink transformation.
    2. Set Output File Names: Use the filenameparam to set the output file name in the Sink transformation.
    3. Ensure Parameter Mapping: Ensure the filenameparam is correctly mapped in the For Each activity.

    By following these steps, you should be able to output the transformed data into 60 files with the same names as the original files, without the unwanted partitioning

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you,

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.