getting CLUSTER_CREATION_TIMED_OUT issue

Venkateswarlu Pottapalli 0 Reputation points
2024-09-06T22:45:27+00:00

Hi,

my jobs are failing without spark session is starting and getting timed out issue.

all my pools are XXL configuration but inside pipeline calling Small or medium.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,974 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,804 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 22,871 Reputation points
    2024-09-07T13:09:46.44+00:00

    it may be potentially due to mismatched configurations between the cluster pool (XXL) and the jobs in the pipeline (small/medium)

    1. Check Pool Configuration: Ensure that the pools associated with your jobs can support the configurations required for small or medium clusters. If your jobs request a smaller cluster configuration, but the pool is set up for XXL instances, this mismatch may cause delays in provisioning, leading to timeouts.

    Increase Timeout Settings: You may need to increase the cluster creation timeout in your pipeline configuration. By default, the timeout for cluster creation is set to a specific limit, and if the cluster isn’t provisioned in time, the job fails. Increasing the timeout can give the pool more time to allocate the necessary resources.

    1. Adjust Job Cluster Size: Ensure that the job configuration within the pipeline aligns with the cluster size supported by the pool. If the jobs are using a smaller size (small or medium), try adjusting the pool configuration to match this.
    2. Cluster Autoscaling: If your pipeline jobs require dynamic scaling, ensure that autoscaling is enabled for your cluster pool. This can help in managing resources better when different job sizes (small, medium, XXL) are being executed.

    Azure Quota and Resource Availability: Check if there are any quota limitations or resource availability issues in your region. Sometimes, if the required instances (e.g., Small or Medium) are not available in sufficient quantity, it may cause cluster creation to time out.

    Logs and Diagnostics: Review the logs in your Databricks workspace for more detailed information about the cluster creation timeout. This will help pinpoint if the issue is related to resource availability, configuration mismatch, or some other factor.

    If you're able to share more detailed logs or error messages, you can share to narrow down the issue

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.