Upsert data in to SQL from delta table

Question

Hello Team,

we have scenario where we have to get the data from lake , process it and then store in SQL database . This is what we are doing

Read the entity from Lake
Store that in delta table _staging
Do merge between delta table and its _staging , so that only new data needs to be updated
Upsert the same data in SQL warehouse

I am not able to find the option on how to do 4th Step i.e how to send the new changes ( insert, delete, update ) to SQL Database

Any pointers are highly appreciated

Answer

Hi @Rocky420 ,

Sorry for the delayed response. You can set up "incremental copy" to only copy delta data (new/updated/deleted) in a couple of ways. Here are some that suit the most in your case :

Delta data loading from database by using a watermark :
In this case, you define a watermark in your source database. A watermark is a column that has the last updated time stamp or an
incrementing key.
Since you have a SQL based source (staging_table), you can leverage Change Tracking technology. It is a lightweight solution in SQL Server and Azure SQL Database that provides an efficient change tracking mechanism for applications. It enables an application to easily identify data that was inserted, updated, or deleted.

To learn more, please refer this doc on incrementally copying data.

Hope this helps.

Answer

Hi Rocky,

In your Target delta file, add a last action & last action date field to capture the updates from the Merge operation.

Using the watermark you can either upload all the data at once to a staging table in SQL and do a SQL Merge operation or you can trigger Insert/Update/delete queries from databricks

to trigger queries an example below

import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._

val TruncStmt = "TRUNCATE TABLE TABLENAME"

val Password = "xxxx"
val dwServer = "xxxx"
val dbname ="xxxx"
val Username = "xxxx" 
val tableName = "xxxx"

 val config = Config(Map(
  "url"          -> dwServer,
  "databaseName" -> dbname,
  "queryCustom"  -> TruncStmt , 
  "user"         -> Username,
  "password"     -> Password
)


//Executing the Merge Proc to merge the data 

sqlContext.sqlDBQuery(config)

Share via

Upsert data in to SQL from delta table

2 answers