How to flatten nested json array in ADF where the first node does not have key and starts with values which is not constant

Satish Hadapad 0 Reputation points
2024-07-08T17:22:36.6666667+00:00

I am trying to parse nested JSON in my pipeline to store the values such as table_name, lastSuccessfulWriteTimestamp, totalProcessedRecordsCount, dataFilesPath, schemaHistory_value1, schemaHistory_value2 from below JSON data using ADF dataflow activity. Can someone help me with this.

{"table1":{"lastSuccessfulWriteTimestamp":"1711559970000","totalProcessedRecordsCount":1709,"dataFilesPath":"pathto_table1_data_files","schemaHistory":{"ef1a14a265934678a3cec646f461019d":"1705424678051"}},

"table2":{"lastSuccessfulWriteTimestamp":"1711559970000","totalProcessedRecordsCount":1709,"dataFilesPath":"pathto_table2_data_files","schemaHistory":{"ef1a14a265934678a3cec646f461019d":"1705424678051"}},

.

.

.

.

"tablen":{"lastSuccessfulWriteTimestamp":"1711559970000","totalProcessedRecordsCount":1709,"dataFilesPath":"pathto_tablen_data_files","schemaHistory":{"ef1a14a265934678a3cec646f461019d":"1705424678051"}}

}

Here, table1 to tablen are table names and I want to extract the data for all tables in columns mentioned above so that i can store it into SQL database.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,128 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Azar 22,035 Reputation points MVP
    2024-07-08T17:43:52.39+00:00

    Hi there Satish Hadapad

    Thanks for using QandA platform'

    you can use the Data Flow activity,

    Use a JSON dataset to load your JSON data.

    Create a new Data Flow and add the source dataset.

    To parse the nested JSON, add a Derived Column transformation. Here, you can extract lastSuccessfulWriteTimestamp, totalProcessedRecordsCount, dataFilesPath, and schemaHistory.

    Add a Flatten transformation to handle the nested schemaHistory object.

    Finally, write the flattened data to your SQL database using a Sink transformation.

    If this helps kindly accept the response thanks much.


  2. Smaran Thoomu 12,615 Reputation points Microsoft Vendor
    2024-07-11T05:29:02.5366667+00:00

    Hi @Satish Hadapad

    As I understand, you want to flatten the nested JSON array in ADF. You can achieve this using the Flatten transformation in the Data Flow activity. Given that the first node doesn't have a key and starts with non-constant values, you'll need to use a combination of transformations to get the desired output.

    You can follow the below steps to flatten the JSON data:
    Step1: Use the Parse JSON transformation to parse the JSON data into a hierarchical structure.

    Step2: Use Flatten transformation with Map

    Add a Flatten transformation and set the Unpivot option to Map. This will allow you to flatten the JSON data while preserving the table names as keys.

    In the Flatten settings, set the following:

    • Input: The parsed JSON data from Step 1
    • Unpivot: Map
    • Map keys: table* (this will capture all table names as keys)
    • Map values: lastSuccessfulWriteTimestamp, totalProcessedRecordsCount, dataFilesPath, schemaHistory

    Step 3: Flatten the schemaHistory array

    Add another Flatten transformation to flatten the schemaHistory array. Set the following:

    • Input: The output from Step 2
    • Unpivot: Array
    • Array: schemaHistory
    • Columns: schemaHistory_value1, schemaHistory_value2 (you can add more columns if needed)

    Step 4: Select and rename columns

    Use the Select transformation to select the desired columns and rename them as needed. For example:

    • table_name: The table name (e.g., table1, table2, etc.)
    • lastSuccessfulWriteTimestamp: The timestamp value
    • totalProcessedRecordsCount: The count value
    • dataFilesPath: The path value
    • schemaHistory_value1: The first schema history value
    • schemaHistory_value2: The second schema history value

    Step 5: Sink the data to your SQL database

    Finally, use the Sink transformation to write the flattened data to your SQL database.

    By following these steps, you should be able to flatten the nested JSON array and extract the desired columns for each table.

    I hope this helps. Please let me know if you have any questions.