Immediate restartability on failure of execute pipeline activity

Ankush O Wathodkar 21 Reputation points
2020-10-30T09:39:22.907+00:00

There is a requirement to rerun the master peipline from the failed execute pipeline activity and its dependent tier pipelines. Though the existing capability helps to rerun the activities from the failed point but this only could be done after the master pipeline has completed its execution. So basically the ask here is to restart from the point of failure without waiting for the master pipeline to complete the runs for the independent flows. Could you please guide us on this front?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,833 questions
{count} votes

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,091 Reputation points
    2020-11-02T20:08:10.12+00:00

    @Ankush O Wathodkar I mentioned in earlier comments, implementing pipeline logic to re-try the copy activity. Below is an example, including picture.

    36951-image.png

    In this example, we have 4 activities. One represents whatever activities lead up to the copy you want to retry. Two are copy activities, one for the first try, the other for the second 're-try'. The last represents whatever activities happen after the copy.

    Leading into the first copy from the "Whatever comes before" is a success dependency.

    From the "First attempt" copy activity to the "Second attempt" copy activity is an on-failure dependancy.

    The "Second attempt" copy activity is connected to the "Do whatever comes after" activity by both a skipped and success dependency. The skipped dependency handles the case when the "first attempt" succeeds, and the "second attempt" does not get run. The success dependency handles when the "first attempt" fails and the "second attempt" succeeds. Should the "second attempt" fail, the pipeline will halt.

    From the "first attempt" copy activity to the "Do whatever comes after activity" is on on completion dependency. On completion dependency includes both the success and failure of the "first attempt". I use on completion instead of both success and failure dependencies for two reasons:

    • it is easier to read
    • on-completion does not cause the pipeline to return failurs status. Connecting by both success and failure would cause the pipeline to report failure status even if everything other than "first attempt" ran successfully.

    The way "Do whatever comes after" reads the dependency logic is this:

    Run "Do whatever comes after" if, and only if The "first attempt" completed AND ( The "second attempt" was successful OR the "second attempt" was skipped")

    Please let me know if this helps.


  2. Ankush O Wathodkar 21 Reputation points
    2020-11-04T09:31:59.857+00:00

    @MartinJaffer-MSFT Thanks Martin for putting all of this together and throwing light on the possible solutions. However, we are thinking more in terms of failures caused due to generic errors( e.g. Bad data in the source file, source file unavailability, something going wrong at execute pipeline level etc.). Now its not possible to define the nature of the error that could occur and hence its difficult to even estimate efforts to fix them before we actually jump onto the execution of the failed activity( in second attempt). With the approach above, the second attempt would kick off as soon as the failure status for the first one is captured. One upfront resolution to this would be to add Wait Activity before the second attempt in the flow. However this again would ask us to:

    1. Quantify the time required to fix the error that could occur
    2. Add two extra activities(Wait+Second Attempt) for each of the activities that will be executed in parallel. This makes the design of the pipeline cubersome especially in case of master pipelines having more than 10 activities running in parrallel.

    Hence what I am looking for is an option just like "Rerun from failed activity" that could be kicked off as soon as the failed status for even a single activity is captured. This will allow the failed activity along with its dependent downstream activities to start the execution at the moment this new option is clicked upon. Of course the support person will definitely take care of fixing the error for the failed activity before opting for the new option.

    Thanks for all your help so far!!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.