Limiting Number of Requests or Queuing Process on Azure Web App (App Service Plan P2V3)

Question

I am using an Azure Web App on an App Service Plan P2V3 and am looking to limit the number of incoming requests or manage the queuing process within the web app service.

In Windows IIS, we can set the application pool queue length to control this. Is there a similar feature available for Azure App Services?
User's image

From my understanding of the available documentation: but not sure i'm going in correct direction:

The maximum IP connections are per instance and depend on the instance size:

1,920 per B1/S1/P1V3 instance
- 3,968 per B2/S2/P2V3 instance
  - 8,064 per B3/S3/P3V3 instance

Additionally, the maxConcurrentRequestsPerCPU setting specifies how many simultaneous requests ASP.NET allows per CPU:

Small (A1): 7,500 requests
Medium (A2): 7,500 requests (x2 cores = 15,000)
Large (A3): 18,750 requests (x4 cores = 75,000)

Given these settings, I have the following questions:

Is it possible to configure a specific limit on the number of incoming requests for an Azure Web App?
Can we limit or control the queuing process in a similar way to the IIS application pool queue length?
What configuration changes are required to achieve this, if possible?
Can scaling up (increasing instance size) help in managing request limits effectively?
Are there any best practices or configurations to achieve this limit.
I saw arcticle using HttpClientFactory for managing HttpClient instances as suggested in this documentation, can be helpful, i'm not sure whter it is correct or not?

Any insights or recommendations on managing request limits and queuing for an Azure Web App would be greatly appreciated

Answer

Hi - Thanks for the question

Generally in a cloud PaaS hosting scenario you'd not try and throttle the site.

One of the things your buying-into with App Service is a turn key web farm architecture which can auto inflate (scale out) on demand. In general terms then you load test to find the right vertical instance size (based on memory and concurrency requirements) you want a "just big enough" compute instance to handle a reasonable number of concurrent requests without creating a "Long tail" of sub optimal requests which might impact end user experience. In other words you need enough compute power that means as close to 100% of request as possible are handled within the SLA you set down (that could be for example a average latency value for an API action or Web site page/content request)

Then you use the scale out and scale back (inflate and deflate) to enable the service to add instances (and remove them again) during load peaks - again whilst serving as close to 100% of requests within the SLA

So my first question is "why" are you looking to apply throttling to this particular application? On an IIS host, on a VM changing thread/core settings and altering the queue limits would eventually cause 503s to be returned to the caller if the box becomes saturated and this is not ideal for the client / user - which if coded well, would probably elect to back-off-and-retry again later. But, doing that causes increase in latency which is generally bad for the end user.

You also dont mention whether the App is using legacy dot net full framework (your screen snip of the worker pool suggests so) or ASPNET core

Back to the question - App Service is a two tier architecture with a front end reverse proxying the traffic and load balancing across one or more (more if you scale out) worker VM instances
which are all identical replicas pointing back at shared storage (wwwroot)
The app pool and worker process are controlled by App Service itself

On Windows App service there is a small amount of control possible to the worker host - because some IIS web config host settings are overridable at the host-level through the application root level web.config which is deployed with the application code

However in this case the settings you mention which related to the netFX ASPNET would (I believe) have belonged to aspnet.config and not web.config - REF https://video2.skills-academy.com/en-us/dotnet/framework/configure-apps/file-schema/web/?redirectedfrom=MSDN

That's assuming it's dotnet full framework. If, on the other hand it's ASPNETCORE:

If the app is aspnetcore then it would depend on whether you're hosting inproc (via ANCM) or out-of-proc (Kestrel) REF https://video2.skills-academy.com/en-us/aspnet/core/host-and-deploy/iis/?view=aspnetcore-8.0

if using Kestrel I dont believe it supports queueing as per David Fowlers response here https://stackoverflow.com/questions/71023595/what-is-the-default-queue-size-and-max-concurrent-connections-requests-supported

However from an ASPNETCORE perspective whether using IIS to proxy to Kestrel or running in proc with ANCM it does look like IIS is still responsible for concurrency limits - it is just that this isnt something you'd be able to override for yourself in a web app REF https://github.com/dotnet/aspnetcore/issues/45277

*

Now the issue is interesting because it points at the more "correct" way to rate limit (and rate limiting is really what you're alluding to here) in ASPNET core at least which is detailed here https://video2.skills-academy.com/en-us/aspnet/core/performance/rate-limit?view=aspnetcore-8.0 and obviously this is something you would control as it's part of your application.

The thing is I dont believe there's an alternative to doing this in a dot net full framework. (as alluded to in the issue) unless you go back to the settings you mentioned - which I dont believe you can override in app service.

As a final note: There is a proxy service in Azure, API management, which would allow for rate limiting over an http endpoint - this is supported in the "consumption" APIM tier too https://video2.skills-academy.com/en-us/azure/api-management/rate-limit-policy

if you were feeling adventurous and were up for the effort - hosting your own proxy in dot net (YARP) , for example in a function, would also allow you to do the same https://microsoft.github.io/reverse-proxy/articles/rate-limiting.html

Share via

Limiting Number of Requests or Queuing Process on Azure Web App (App Service Plan P2V3)

1 answer