Istio service mesh add-on minor revision upgrade troubleshooting

This article discusses troubleshooting scenarios and restrictions in the minor revision upgrade and rollback processes for the Istio service mesh add-on in Microsoft Azure Kubernetes Service (AKS).

Note

Istio uses the term "revisions" to implement the canary upgrade process and distinguish between versions. Each revision designation (written as x-y) corresponds to a major.minor version designation (x.y). You can control your control plane revision, but you can't control the specific patch version within a revision band.

Prerequisites

Troubleshooting matrix

The following table lists various problems and the different scenarios and solutions for those problems.

Scenario Problem Solution
Data plane workloads are dropped from the mesh. Data plane and control plane revisions didn't correspond before you completed or rolled back an upgrade.

Follow these steps:

  1. Relabel namespaces that contain workloads by specifying the revision that's expected to exist after the upgrade completion or rollback. To do this, run the kubectl label command:

    kubectl label namespace default istio.io/rev=asm-x-y --overwrite
  2. Restart the corresponding workload deployments to trigger sidecar reinjection of the correct revision. To do this, run the kubectl rollout restart command:

    kubectl rollout restart deployment <deployment name>
  3. Verify that the sidecar images exist. To do this, run the kubectl get command:

    kubectl get pods --namespace <namespace> --output yaml | grep mcr.microsoft.com/oss/istio/proxyv2:
Control plane pods are in the pending state. The pods lack capacity. Verify the state of the pods by running the kubectl describe command. If capacity is the problem, you can scale up your cluster to add another node. For more information, see Manually scale the node count in an Azure Kubernetes Service (AKS) cluster.
The az aks mesh get-upgrades command returns no available upgrades. The newest Istio revision might be incompatible with the current AKS cluster version. You can use the az aks mesh get-revisions command to discover whether newer Istio revisions exist. The output includes a list of compatible cluster versions for each Istio revision. Therefore, you can determine whether a cluster upgrade is necessary.

Note

To avoid unintended behavior and broken functionality, and also make sure that you're receiving updates for security vulnerabilities, we strongly recommend that you upgrade to a supported and up-to-date AKS version and Istio add-on revision. Remember that the add-on revision should also be within the supported Kubernetes version range for the given AKS cluster. As highlighted in the Minor revision upgrade section of the Istio upgrade article, you can run the az aks mesh get-revisions and az aks mesh get-upgrades commands to learn about available add-on revisions, upgrades, and compatibility information.

Restrictions

  • A downgrade to an older revision (outside the canary rollback process) isn't allowed.

  • Skipping from one revision to a nonconsecutive revision is allowed only if AKS no longer supports both the current revision and the next upgrade revision. At this point, the only upgrade that's available to you is the lowest supported revision.

  • The Istio sidecar.istio.io/inject label doesn't enable sidecar injection for the Istio add-on. You must use the istio.io/rev label when you label and relabel your namespaces during the canary upgrade.

  • Labeling must occur on a namespace level instead of on a per-deployment level. If you want to be able to roll over pods individually, you can choose to restart individual deployments instead of using pod labeling.

  • If you're using the Istio add-on Shared MeshConfig, you have to copy or transfer MeshConfig settings to the new ConfigMap before you do a canary upgrade. For more information, see Mesh configuration and upgrades.

  • The Istio add-on deploys Istio ingress gateway pods and deployments per revision. If you're doing a canary upgrade and have two control plane revisions installed in your cluster, you might have to troubleshoot multiple ingress gateway pods across both revisions.

References

Third-party information disclaimer

The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

Third-party contact disclaimer

Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.