What is the optimal architecture design on Azure for an infrequently used backend that needs a robust configuration?

Renaud Chrétien 1 Reputation point
2022-03-15T13:48:54.433+00:00

I'm trying to find the optimal cloud architecture to host a software on Microsoft Azure.

The scenario is the following:

  • A (containerised) REST API is exposed to the users through which they can submit POST and GET requests. POST requests trigger a backend that needs a robust configuration to operate properly and GET requests are sent to fetch the result of the backend, if any. This component of the solution is currently hosted on an Azure Web App Service which does the job perfectly.
  • The (containerised) backend (triggered by POST requests) perform heavy calculations during a short amount of time (typically 5-10 minutes are allotted for the calculation). This backend needs (at least) 4 cores and 16 Gb RAM, but the more the better.

The current configuration consists in the backend hosted together with the REST API on the App Service with a plan that accommodates the backend's requirements. This is clearly not very cost-efficient, as the backend is idle ~90% of the time. On top of that it's not really scalable despite an automatic scaling rule to spawn new instances based on the CPU use: it's indeed possible that if several POST requests come at the same time, they are handled by the same instance and make it crash due to a lack of memory.

  • Azure Functions doesn't seem to be an option: the serverless (consumption plan) solution they propose is restricted to 1.5 Gb RAM and doesn't have Docker support. More expensive solutions, such as the az Functions EP3 plan need to have one instance allocated all the time, hence making that choice not cost-effective and quite similar to the App Service.
  • Azure Container Instances neither, because first the max number of CPUs is 4 (which is really few for the needs here, although acceptable) and second there are cold starts of approximately 2 minutes (I imagine due to the creation of the container group, pull of the image, and so on). Despite the process is async from a user perspective, a high latency is not allowed as the result is expected within 5-10 minutes, so cold starts are a problem.
  • Azure Batch, which at first glance appears to be a perfect fit (beefy configurations available, made for hpc, cost effective, made for time limited tasks, ...) seems to be slow too (it takes a couple of minutes to resize a pool and jobs don't run immediately when submitted).

So to reformulate a bit: is it possible to have a solution that scales to 0 instance (so really no cost) when there's no backend work to process and that can scale up almost instantaneously (let's say within 1 or 2 minutes) to n instances of a beefy config (let's say 8 CPUs/16Gb RAM)?

Do you have any idea what I could use?

Thanks in advance!

PS: this question follows from the original stackoverflow post: https://stackoverflow.com/questions/71411555/what-is-the-optimal-architecture-design-on-azure-for-an-infrequently-used-backen/71413699#71413699

Azure Cloud Services
Azure Cloud Services
An Azure platform as a service offer that is used to deploy web and cloud applications.
668 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,961 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Evgeny Grishchenko 486 Reputation points
    2024-01-09T09:33:14.1+00:00

    hi, it's always a dilemma between inexpensive and fast. You can't have both. As soon as you have 0 instances, the cold start would be an issue. Based on your non-functional requirements, I would say there is no solution. In this case it makes sense to look into your backend and try to optimise it to reduce cold start time, so you could be benefitting from the pay-as-you go plans with 0 instances.

    0 comments No comments

  2. AlaaBarqawi_MSFT 942 Reputation points Microsoft Employee
    2024-01-09T10:33:11.5366667+00:00

    Hi Renaud Chrétien

    i suggest you to use AKS for your backend as its containerized , you have 2 options for cost saving

    1-create automation script to create AKS cluster with required parameters and application images

    and once finish the job you can tear down , think of terraform as option for automation

    2-you can create your AKS cluster and after the batch job finish you can stop the cluster from azure portal

    or cli , the cost will be minimum

    To better optimize your costs during these periods, you can turn off, or stop, your cluster. This action stops your control plane and agent nodes, allowing you to save on all the compute costs, while maintaining all objects except standalone pods. The cluster state is stored for when you start it again, allowing you to pick up where you left off.

    az aks stop --name myAKSCluster --resource-group myResourceGroup
     
    
    0 comments No comments