null columns in source

Vineet S 910 Reputation points
2024-07-06T09:00:35.7666667+00:00

Hi

how to handle null in source tables in adf ... it is not showing column names for blank data columns ... it is parque format

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,677 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 24,531 Reputation points
    2024-07-06T11:32:26.8633333+00:00

    First step to take is that you need to check the schema of your Parquet files to confirm the presence and data types of the columns, even if they contain null values. This can be done using various tools like PySpark, Databricks, or any Parquet file viewer.

    ADF Data Flows can handle schema drift and null values also :

    1. Create a Data Flow:
      • In ADF, go to the "Author" tab and create a new Data Flow.
    2. Add Source Transformation:
      • Add a Source transformation and connect it to your Parquet file dataset.
      • In the Source settings, ensure you have "Allow schema drift" enabled. This setting allows the Data Flow to handle columns that may have null values or are missing in some files.
    3. Add Derived Column (if needed):
      • If you need to handle null values specifically (e.g., replace nulls with a default value), add a Derived Column transformation.
      • In the Derived Column transformation, you can use expressions to handle null values, like iif(isNull(columnName), 'default_value', columnName).
    4. Mapping: In the Sink transformation, ensure the schema mapping is set up correctly. You may want to enable "Auto Mapping" to let ADF handle the mapping based on available columns.

    For Parquet files, schema evolution (changes in schema over time) can cause issues if not handled properly. ADF Data Flows can manage schema evolution by enabling schema drift in the source transformation or by dynamically mapping columns in the Sink transformation to ensure all columns, even those with nulls, are included.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.