Pipeline Executing Databricks Notebook Successfully Despite Stopped Cluster

vikranth-0706 180 Reputation points
2024-06-20T11:17:32.34+00:00

In Azure Data Factory (ADF), I have a pipeline that executes a notebook in Azure Databricks. I noticed that even when the Databricks cluster is stopped, the ADF pipeline still completes successfully, and the notebook runs without any issues.

Is this behavior normal?

Shouldn’t the cluster need to be running for the notebook to execute successfully?

Does ADF automatically start the cluster when initiating a notebook run?

Could there be a configuration in ADF or Databricks that is allowing this to happen?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 12,100 Reputation points Microsoft Vendor
    2024-06-20T11:40:00.7933333+00:00

    Hi @vikranth-0706

    Thank you for reaching out to the community forum with your query.

    Regarding your questions:

    Is this behavior normal?

    Yes, it's normal behavior in Azure Databricks.

    When you execute a Databricks notebook using Azure Data Factory, the Databricks cluster is started automatically if it is not already running. This is because Azure Data Factory uses the Databricks REST API to start the cluster before executing the notebook.

    However, if the cluster is already running, Azure Data Factory will not stop the cluster after the notebook execution is complete. This is because the cluster may be used by other notebooks or jobs, and stopping the cluster may cause issues for those jobs.

    So, in your case, if the Databricks cluster was already running when the Azure Data Factory pipeline executed the notebook, the cluster would not have been stopped after the notebook execution was complete.

    Shouldn’t the cluster need to be running for the notebook to execute successfully?

    Ideally, the Databricks cluster should be running for the notebook to execute successfully. However, ADF does not check the status of the cluster before initiating the notebook run, which is why the pipeline completes successfully even when the cluster is stopped.

    Does ADF automatically start the cluster when initiating a notebook run?

    No, ADF does not automatically start the Databricks cluster when initiating a notebook run. However, you can configure ADF to start the cluster before the notebook run and stop it after the run is complete.

    Could there be a configuration in ADF or Databricks that is allowing this to happen?

    No, there is no specific configuration in ADF or Databricks that allows this behavior. It is simply the default behavior of ADF when initiating a notebook run.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful