Troubleshoot Apache Flink® cluster configurations on HDInsight on AKS

Note

We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.

Only basic support will be available until the retirement date.

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

Incorrect cluster configuration may lead to deployment errors. Typically those errors occur when incorrect configuration provided in ARM template or input in Azure portal, for example, on Configuration management page.

Example configuration error:

Screenshot shows error.

The following table provides error codes and their description to help diagnose and fix common errors.

Configuration error

Error Code Description
FlinkClusterValidator#IdentityValidator Checks if the task manager (TM) and job manager (JM) process size has suffix mb.
Checks if the TM and JM process size is less than the configured pod memory.
FlinkClusterValidator#IdentityValidator Verifies if the pod identity is configured correctly
FlinkClusterValidator#ClusterSpecValidator Checks if the JM, TM and history server (HS) pod CPU configured is within the configurable/allocatable SKU limits
Checks if the JM, TM and history server (HS) pod memory configured is within the configurable/allocatable SKU limits
FlinkClusterValidator#StorageSpecValidator Storage container validation for the appropriate name of the container
Verify with the supported storage types

System error

Some of the errors may occur due to environment conditions and be transient. These errors have reason starting with "System" as prefix. In such cases, try the following steps:

  1. Collect the following information:

    • Azure request CorrelationId. It can be found either in Notifications area; or under Resource Group where cluster is located, on Deployments page; or in az command output.

    • DeploymentId. It can be found in the Cluster Overview page.

    • Detailed error message.

  2. Contact support team with this information.

Error code Description
System.DependencyFailure Failure in one of cluster components.

Reference