is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

Question

Hi All,

we are follow up this url-
https://video2.skills-academy.com/en-us/azure/databricks/data/data-sources/sql-databases

we want to know is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

our requirement is very sample we want write data frame data to azure sql table

Answer

Thanks for the ask and using the forum .

Well the Adb is architechtured is such a way that they can read lot of data at the same time using executors . While writing the data on the SQL side i think you should plan better and implement partitions , otherwise while inserting huge data in the same table at the same time can turn out to be a nightmare . There is no limitation as such .

Thanks Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

Answer

There are purpose-built Spark Connectors for SQL Server and Azure SQL Database and for Azure Synapse SQL Pools (SQL DW).

You should use these for large loads, rather than the generic JDBC Spark Connector. But the JDBC Connector will work; it just may be slower than the purpose-built ones.

Answer

Hi @Verma, Manish Kumar /@manish verma

With all due respect, I would like to share some of my thoughts about your requirements. I am understanding that your requirement is reading billion records(Assuming source data coming in File) from the source using Databricks and write into SQL server Hyper scale.
I could not able to see any limitations on the JDBC driver supporting writing billion records as the issues starts to occur when you could not able to fit all of your source records into your Databricks Spark Cluster Memory and writing technique used .As a developer we need to come out with better design patterns to handle large volumes data inside Databricks. One of the better design pattern @HimanshuSinha-msft provided is partitioning the data and this will split your data into multiple partitions and also improve the writing speed as each partition data will be written independently.

Again with due respect, I am not Microsoft employee but have not seen release of any partial tested components by Microsoft without giving relevant documentation about it. We all will try to help if you provide what error is occurring when you are loading large volumes using Databricks and could able to find out the better design approach .

Share via

is there any limitation of writing billions of record from Azure Data bricks to Azure SQL using JDBC

3 answers

Your answer