Invalid records failed in DQ checks

Anshal 2,246 Reputation points
2024-05-24T10:55:51.94+00:00

We are capturing the records that failed in DQ checks by using Databricks in the Blob storage for business owners to resolve inconsistencies, we have added an extra column as DQ checks failed reason. I have the following:

What if the particular record fails multiple times? then would it be soft stop or hard stop? Which is better design? I want the count of a specific record that failed so and so times, logically I can have an extra column and get the count. Are there any scalable and better approaches?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,174 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,376 Reputation points Microsoft Employee
    2024-05-27T04:46:24.4433333+00:00

    @Anshal - Thanks for the question and using MS Q&A platform.

    When it comes to handling records that fail DQ checks, whether to use a soft stop or hard stop depends on your specific use case and business requirements. If you want to allow processing to continue even if a record fails multiple times, then a soft stop would be a better design. On the other hand, if you want to stop the processing completely if a record fails multiple times, then a hard stop would be a better design.

    As for getting the count of a specific record that failed multiple times, adding an extra column to track the count is a logical approach. However, if you are dealing with a large volume of data, this approach may not be scalable. In that case, you can consider using a distributed computing framework like Apache Spark to process the data and get the count of specific records that failed multiple times.

    Using Spark's groupBy and count functions, you can group the data by the record ID and count the number of times it failed. This approach is scalable and can handle large volumes of data.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.