Datetime2 error when querying parquet file produced from Synapse copy activity from OData Business Central Source (data seems to be changed from 0001-01-01 to 0000-12-30)

Question

I have a pipeline running a copy activity to copy data from Business Central OData source to parquet file everyday. Everything works fine before 16/07/2024 but problem arises starting from 17/07/2024. The parquet file no longer be able to query from Synapse with the following error: **Error encountered while parsing data: 'Inserting value to batch for column type DATETIME2 failed. Invalid argument provided.'.
**
In the data, there are some date with "0001-01-01". Before 16/07/2024, it is being copy successfully to parquet without any issue. However, after 17/07/2024, "0001-01-01" is changed to "0000-12-30" after the copy activity, which I believe this is the cause of the issue.

To investigate those parquet, I used pyspark with the following spark setting:

spark.conf.set("spark.sql.parquet.int96RebaseModeInRead", "CORRECTED")
spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED")
spark.conf.set("spark.sql.parquet.datetimeRebaseModeInRead", "CORRECTED")
spark.conf.set("spark.sql.parquet.datetimeRebaseModeInWrite", "CORRECTED")

for the same data:

parquet before 16/07/2024 shows "0001-01-01"
parquet after 17/07/2024 shows "0000-12-30"

Is there any problem with the copy activity in synapse that changes the date from "0001-01-01" to "0000-12-30"?

Thank you very much for the help

Answer

After further checking with my internal team, there are some known issues with spark 3

You can find the details of the limitations on the below blog post.

https://www.databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

We have a work around to solve this issue in dataflows, where we set java8datetimeapi config.

This config can be set

At subscription level - this will affect all the dataflows running in the subscription
At IR level - only dataflows running on that IR will have this behavior. We can only run dataflows related to old dates in this IR

As a general advice, it is recommended that using old dates is not good in data processing. It is not just java (or) spark limitation but can happen with any other ecosystem which has issues with these kinds of dates.

Without config the data preview looks like below -

User's image

With custom property at IR level, it looks like below -

User's image

The IR is as follows. New custom properties can be set as a drop down for customers and customer can change the values And assign this new IR to all dataflows that require these properties.

User's image

I hope this helps.

Answer

We are experiencing the same issue that began on July 16th or 17th 2024.

This issue occurs in Synapse when using a copy activity with a Business Central OData source and writing to Parquet files. As a result, Synapse is unable to create views from the Parquet files because datetime2 values of 0001-01-01 cannot be evaluated correctly.

We suspect that the cause could be a change / update in Business Central, the OData connector, or possibly (but less likely) the Spark configuration.

Could someone from Microsoft please investigate this issue?

In the meantime, we are working on a workaround to convert all faulty 0001-01-01 values in all tables to a value that can be correctly evaluated by Synapse Serverless SQL.

Share via

Datetime2 error when querying parquet file produced from Synapse copy activity from OData Business Central Source (data seems to be changed from 0001-01-01 to 0000-12-30)

2 answers

Your answer