Management and operations for the Azure Container Apps - Landing Zone Accelerator

Review features and services of Azure Container Apps available to help you design and maintain your app for long-term health and stability.

  • Understand the Container Apps limits.

  • Consider isolating workloads at the network, compute, monitor, or data level.

  • Understand ways to control resource consumption by workloads.

  • Use health probes to help report and recover from deteriorating application health.

  • Use Dapr to establish secure connections to external services.

  • Use logging and monitoring to provide insight into any issues associated with your applications.

  • Use alerts during critical application and system events to ensure operations staff can take swift action in the event application failures.

  • Define a scaling strategy to ensure enough capacity available to handle traffic to your application, while minimizing unused capacity. Scaling triggers include CPU or memory usage, along with any KEDA-supported scaler.

  • Be familiar with Envoy as Azure Container Apps uses it as a network proxy.

  • Be aware of recovery time objective (RTO) and recovery point objective (RPO) requirements around Business Continuity and Disaster Recovery. Define a service-level agreement (SLA) for your infrastructure and application. Learn about the SLA for Azure Container Apps. See the SLA details section for information about monthly uptime calculations.

  • Depending on the specific requirements for your application, you may need to use high-availability measures to ensure continued operation if there are issues with the underlying Azure platform. In Azure, the various zones and regions allow you to build solutions for high-availability:

    • Availability Zones are fault isolation constructs in Azure datacenter design. Each zone has its own power, network and cooling to minimize the chance of outages spreading across zones. To use Availability Zones, each Azure resource can be deployed either to a specific zone ("zonal") or to all zones ("zone redundant").

    • Multi-region solutions provide the highest level of fault isolation and the highest reliability, but are often more difficult to implement because of the higher latency between the geographic regions. This latency can cause data-replication delays. For more information on multi-region design, see the Azure Mission Critical documentation.

  • Consider using Azure DevOps and GitHub to provide automated ways of managing development, build, and deployment processes.

Recommendations

  • Isolate by environment: Create distinct Container Apps environments for full resource isolation. Avoid using revisions to create tenant-specific container apps. For more information, see Azure Container Apps in multitenant solution.

  • Use limits to compute resources: Use containers CPU and memory resources requests limits to manage the compute and memory resources within an environment. Container default limits are 2 vCPU and 4 GiB for compute and memory respectively.

  • Use health probes: Add health probes to your container apps. Make sure revisions contain livenessProbe, readinessProbe, and startupProbe. For more information, see Azure Container Apps health probes.

  • Configure health probes correctly: The health probe is responsible for making calls to an endpoint and expects to receive a success status code, typically in the HTTP 2xx range, when the system is in a healthy state. It is recommended that this endpoint performs checks not only on the system's health but also on the health of critical downstream components, such as databases, storage, and messaging services. To prevent a continuous cascade of health checks, it's important to implement caching of the downstream health responses for a brief duration.

  • Log extensively: Create Log Analytics queries to look for warnings, errors, and critical messages.

    • Application logs are generated by containers console output (stdout/stderr) messages. When Dapr is enabled, console output contains both application container and Dapr sidecar messages. Review Log monitoring for more detail on how to query logs using log analytics.

    • System logs are generated by Azure Container Apps.

  • Enable visual tracing: When you enable Dapr, configure the `DaprAIInstrumentationKey`` at the ACA environment level to visualize container apps distributed tracing in the Azure Application Insights application map.

  • Use the Application Insights SDK: Using the Application Insights SDK for application data as auto-instrumentation agent isn't supported yet.

  • Use availability zones: When you need high-availability, use Availability Zones on all resources. Ensure that not only your Container Apps are zone redundant, but also adjacent services required to fulfill requests, such as databases, storage and messaging services.

  • Use distributed replication: For disaster recovery (DR) purposes, ensure your application data and source code are available in more than one Azure region. For example, Azure Storage accounts allow geo-replicated storage and Azure SQL Databases allow read-replicas to be placed in other regions.

  • Automate builds: Use end-to-end automation to build and deploy your Azure Container Apps applications.

  • Use a container registry: Store your container images in Azure Container Registry and geo-replicate the registry to each ACA region.

  • Test your disaster recovery plan: Create and test a disaster recovery plan regularly using key failure scenarios. For more information, see Testing backup and disaster recovery.