Dataflow and copy activity upsert behavior

Matthew Samyn 26 Reputation points
2024-07-16T06:47:47.7366667+00:00

Hi,

We are using the Azure Synapse to copy data from our datalakehouse (adls gen2 storage) to Dataverse tables. We're using the Dynamics 365 Linked service.

To do this we are making use of a combination of data flows and copy steps, using 'upsert' as our copy-step behavior.

Copy-activity:

User's image

Data flow activity:

User's image

User's image

It has come to our attention that every night, when we run these activities, Dataverse gets overloaded with new data that we write towards it. However, I was under the assumption that 'upsert' only updates the row that have gotten a change and inserts the fully new rows. Now it seems like these activities are writing their data to Dataverse even if no change was made since last time.

My question: Is below statement true or false and please give more explanation on why it's like that.

Upsert:

  • Inserts new rows
  • Updates only the existing rows where a column value has changed.
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,839 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,519 questions
0 comments No comments
{count} votes

Accepted answer
  1. Chandra Boorla 1,335 Reputation points Microsoft Vendor
    2024-07-16T07:54:59.8233333+00:00

    Hi @Matthew Samyn
    Thanks for the question and using MS Q&A platform.

    In Azure Synapse, the ‘upsert’ operation typically refers to an operation that either inserts new rows based on a specified key or updates existing ones. However, it’s important to note that the ‘upsert’ operation in Azure Synapse modifies all records irrespective of the changes in data. This means that even if no changes were made to the data, the ‘upsert’ operation will still overwrite the existing data in the target table.
    The upsert behavior is designed to ensure data consistency and integrity. By updating the entire row, it ensures that the data in Dynamics 365 is always in sync with the source data. However, this behavior can lead to performance issues and data overload, especially if you're dealing with large datasets.

    Here's what happens behind the scenes:

    • The copy activity reads the data from the source (ADLS Gen2 storage).
    • It then compares the source data with the existing data in Dataverse based on the key columns specified in the Dynamics 365 Linked service.
    • If a matching row is found, the entire row is updated in Dataverse, even if only one column has changed. This means that all columns, not just the changed ones, are overwritten with the new values from the source data.
    • If no matching row is found, a new row is inserted into Dataverse.

    This behavior can lead to the issue you're experiencing, where Dataverse gets overloaded with new data every night, even if no changes were made since the last run. To achieve the desired behavior of only updating rows where a column value has changed, you might need to consider using a different approach, such as:

    • Using a merge statement in a data flow activity to update only the changed columns.
    • Implementing a change data capture (CDC) mechanism to track changes in your source data and only update the changed rows in Dataverse.
    • Using a staging table in Dataverse to store the updated data and then use a stored procedure or a data flow activity to update the target table only where changes have occurred.

    I hope this information helps, please do let us know if you have any further queries.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.