Velero backup is failed from Arc-enabled AKS cluster on Azure Stack HCI

Saravanan Krishnan 40 Reputation points
2024-02-15T08:23:18.8266667+00:00

We have created Azure Arc-Enabled cluster in Azure Stack HCI and tried with velero backup deployment. The backup is failing.

NAME            STATUS   ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
tomcat-backup   Failed   0        0          2024-02-15 07:14:27 +0000 UTC   29d       default           

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
300 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,961 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Achraf Ben Alaya 976 Reputation points MVP
    2024-02-15T09:09:49.4333333+00:00

    The error messages you're encountering indicate a failure to refresh the access token required for Velero to communicate with Azure resources. This typically happens due to networking issues or authentication problems. Here are some steps you can take to troubleshoot and potentially resolve the issue:

    1. Check Network Connectivity:
    • Ensure that the Velero pod has outbound internet access to communicate with Azure services. The error message mentions a timeout when trying to access login.microsoftonline.com, which indicates a possible network connectivity issue.
      • Check if there are any network restrictions, firewalls, or proxy configurations that might be blocking the outgoing requests from the Velero pod.
    1. Verify Azure Credentials:
    • Double-check the Azure credentials (service principal or managed identity) used by Velero to ensure they are correct and have the necessary permissions to access the Azure resources.
      • Make sure the Azure credentials are correctly configured in the Velero deployment.
    1. Token Refresh Configuration:
    • Review the configuration related to token refresh in your Velero deployment. There might be settings that control how often Velero refreshes its access token. Adjusting these settings could potentially mitigate the issue.
      • Ensure that any time synchronization mechanisms in your cluster are working correctly, as token refresh failures can sometimes be caused by clock skew issues.
    1. Retry the Backup Operation:
      • Sometimes, transient issues can cause token refresh failures. Retry the backup operation after ensuring that any temporary networking issues have been resolved.
    2. Check Azure Service Status:
      • Occasionally, Azure services might experience downtime or issues that could affect authentication. Check the Azure status page or any relevant service health dashboards to see if there are any ongoing incidents.
    3. Update Velero and Azure Provider:
      • Ensure that you are using the latest versions of Velero and the Azure provider plugin. Bugs or issues related to token refresh might have been addressed in newer releases.
      Logging and Monitoring:
      • Enable detailed logging in Velero to capture more information about the token refresh process and any potential errors.
        • Monitor the Azure activity logs and diagnostic logs to see if there are any relevant entries that could provide additional insights into the issue.

    By following these steps and investigating the potential causes outlined above, you should be able to diagnose and resolve the token refresh failure issue with Velero backups in your Azure Arc-enabled cluster.

    0 comments No comments