Datetime2 error when querying parquet file produced from Synapse copy activity from OData Business Central Source (data seems to be changed from 0001-01-01 to 0000-12-30)

Victor Chu 10 Reputation points
2024-07-24T15:13:17.94+00:00

I have a pipeline running a copy activity to copy data from Business Central OData source to parquet file everyday. Everything works fine before 16/07/2024 but problem arises starting from 17/07/2024. The parquet file no longer be able to query from Synapse with the following error: **Error encountered while parsing data: 'Inserting value to batch for column type DATETIME2 failed. Invalid argument provided.'.
**
In the data, there are some date with "0001-01-01". Before 16/07/2024, it is being copy successfully to parquet without any issue. However, after 17/07/2024, "0001-01-01" is changed to "0000-12-30" after the copy activity, which I believe this is the cause of the issue.

To investigate those parquet, I used pyspark with the following spark setting:

spark.conf.set("spark.sql.parquet.int96RebaseModeInRead", "CORRECTED")
spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED")
spark.conf.set("spark.sql.parquet.datetimeRebaseModeInRead", "CORRECTED")
spark.conf.set("spark.sql.parquet.datetimeRebaseModeInWrite", "CORRECTED")

for the same data:

  • parquet before 16/07/2024 shows "0001-01-01"
  • parquet after 17/07/2024 shows "0000-12-30"

Is there any problem with the copy activity in synapse that changes the date from "0001-01-01" to "0000-12-30"?

Thank you very much for the help

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,841 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,521 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Bhargava-MSFT 30,496 Reputation points Microsoft Employee
    2024-07-25T15:10:23.2333333+00:00

    After further checking with my internal team, there are some known issues with spark 3

    You can find the details of the limitations on the below blog post.

    https://www.databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

    We have a work around to solve this issue in dataflows, where we set java8datetimeapi config.

     

    This config can be set

    1. At subscription level - this will affect all the dataflows running in the subscription
    2. At IR level - only dataflows running on that IR will have this behavior. We can only run dataflows related to old dates in this IR

    As a general advice, it is recommended that using old dates is not good in data processing. It is not just java (or) spark limitation but can happen with any other ecosystem which has issues with these kinds of dates.

     

    Without config the data preview looks like below -

    User's image

    With custom property at IR level, it looks like below -

    User's image

    The IR is as follows. New custom properties can be set as a drop down for customers and customer can change the values And assign this new IR to all dataflows that require these properties.

    User's image

    I hope this helps.


  2. Roel Knitel 0 Reputation points
    2024-07-30T06:07:51.6+00:00

    We are experiencing the same issue that began on July 16th or 17th 2024.

    This issue occurs in Synapse when using a copy activity with a Business Central OData source and writing to Parquet files. As a result, Synapse is unable to create views from the Parquet files because datetime2 values of 0001-01-01 cannot be evaluated correctly.

    We suspect that the cause could be a change / update in Business Central, the OData connector, or possibly (but less likely) the Spark configuration.

    Could someone from Microsoft please investigate this issue?

    In the meantime, we are working on a workaround to convert all faulty 0001-01-01 values in all tables to a value that can be correctly evaluated by Synapse Serverless SQL.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.