Compute throttling limits
Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets
Microsoft Compute implements throttling mechanism to help with the overall performance of the service and to give a consistent experience to the customers. API requests that exceed the maximum allowed limits are throttled and users get an HTTP 429 error. All Compute throttling policies are implemented on a per-region basis.
How do the throttling policies work?
Microsoft Compute implements throttling policies that limit the number of API requests made per resource and per subscription per region per minute. If the number of API requests exceeds these limits, the requests are throttled. Here's how these limits work:
Per Resource Limit – Each resource, such as a virtual machine (VM), has a specific limit for API requests. For instance, let us assume that a user creates 10 VMs in a subscription. The user can invoke up to 12 update requests for each VM in one minute. If the user exceeds the limit for the VM, API requests are throttled. This limit ensures that a few resources don’t consume the subscription level limits and throttle other resources.
Subscription Limit – In addition to resource limits, there's an overarching limit on the number of API requests across all resources within a subscription. Any API requests beyond this limit are throttled, regardless of whether the limit for an individual resource has been reached. For instance, let us assume that a user has 200 VMs in a subscription. Even though user is entitled to initiate up to 12 Update VM requests for each VM, the aggregate limit for Update VM API requests is capped at 1500 per min. Any Update VM API requests for the subscription exceeding 1500 are throttled.
How does Microsoft Compute determine throttling limits?
To determine the limits for each resource and subscription, Microsoft Compute uses Token Bucket Algorithm. This algorithm creates buckets for each limit and holds a specific number of tokens in each bucket. The number of tokens in a bucket represent the throttling limit at any given minute.
At the start of throttling window, when the resource is created, the bucket is filled to its Maximum Capacity. Each API request initiated by the user consumes one token. When the token count depletes to zero, subsequent API requests are throttled. Bucket is replenished with new tokens every minute at a consistent rate called Bucket Refill Rate for a resource and a subscription.
For Instance: Let us consider the 'throttling policy for VM Update API' that stipulates a Bucket Refill Rate of four tokens per minute, and a Maximum Bucket Capacity of 12 tokens. The user invokes the Update VM API request for a virtual machine (VM) as per the following table. Initially, the bucket is filled with 12 tokens at the start of the throttling window. By the fourth minute, the user utilizes all 12 tokens, leaving the bucket empty. In the fifth minute, the bucket is replenished with four new tokens in accordance with the Bucket Refill Rate. So, four API requests can be made in the fifth minute, while Microsoft Compute throttles one API request due to insufficient tokens.
(min) | 1st | 2nd | 3rd | 4th | 5th | 6th |
---|---|---|---|---|---|---|
Number of tokens in the beginning (A) | 12 | 12 | 8 | 12 | 4 | 4 |
Requests per minute (B) | 0 | 8 | 0 | 13 | 5 | 0 |
Throttled requests (C) | 0 | 0 | 0 | 1 | 1 | 0 |
Remaining tokens at the end of period D = Max(A-B, 0) |
12 | 4 | 8 | 0 | 0 | 4 |
Similar process is followed for determining the throttling limits at subscription level. The following sections detail the Bucket refill rate and Maximum bucket capacity that is used to determine throttling limits for Virtual Machines, Virtual Machine Scale Sets and Virtual Machines Scale Set VMs.
Throttling limits for Virtual Machines
API requests for Virtual Machines are categorized into seven distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:
Policy category | REST APIs | Resource Level | Resource Level | Subscription Level | Subscription Level |
---|---|---|---|---|---|
Bucket refill rate (Per Min) | Maximum Bucket capacity (Per Min) |
Bucket refill rate (Per Min) |
Maximum Bucket capacity (Per Min) |
||
Put VM (Create new VMs) |
Create | 4 | 12 | 500 | 1,500 |
Update VM (Update existing VMs) |
Update Reapply Restart Power Off Start Generalize Convert To Managed Disks Redeploy Perform Maintenance Capture Run Command Create Or Update Extensions - Update Extensions - Delete Reimage Update Run Commands - Update Run Commands - Delete Run Commands - Create Or Update |
4 | 12 | 500 | 1,500 |
Delete VM (Delete VMs) |
Delete Simulate Eviction Deallocate |
4 | 12 | 500 | 1,500 |
Low Cost Get VM (Get information on single VM) |
Get Instance View Extensions - Get List Available Sizes Retrieve Boot Diagnostics Data Run Commands - Get By Virtual Machine Run Commands - List By Virtual Machine |
12 | 36 | 8,000 | 24,000 |
High Cost Get VM1 (Get information on multiple VMs) |
List List All List By Location |
NA | NA | 300 | 900 |
Get Operation (Get information on async VM operations) |
Status of asynchronous operations | 15 | 45 | 5,000 | 15,000 |
VM Guest Patch Operations (Assess & install guest patches) |
Assess Patches Install Patches |
2 | 6 | 200 | 600 |
1 Only subscription level policies are applicable.
Throttling limits for Virtual Machine Scale Sets
API requests for Virtual Machine Scale Set(Uniform & Flex) are categorized into 5 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. These policies are applicable to both Flex and Uniform orchestration modes. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:
Policy category | REST APIs | Resource Level | Resource Level | Subscription Level | Subscription Level |
---|---|---|---|---|---|
Bucket refill rate (Per Min) |
Maximum Bucket capacity (Per Min) |
Bucket refill rate (Per Min) | Maximum Bucket capacity (Per Min) |
||
Put (Create new scale set) |
Create | 4 | 12 | 125 | 375 |
Update (Update existing scaleset) |
Update Start2 Restart2 Redeploy2 Perform Maintenance2 Reimage2 Reimage All2 Create Or Update Rolling Upgrades - Cancel Extensions - Create Extensions - Update Extensions - Delete Force Recovery Service Fabric Platform Update Domain Walk Convert To Single Placement Group Set Orchestration Service State |
4 | 12 | 500 | 1,500 |
Delete (Delete scale set) |
Delete Power Off2 Deallocate |
4 | 12 | 175 | 525 |
Low Cost Get (Get information on single scale set) |
Get List Skus Rolling Upgrades - Get Latest Get OS Upgrade History |
12 | 36 | 800 | 2,400 |
High Cost Get (Get resource intensive information) |
Get Instance View List2 List All2 List By Location2 |
10 | 30 | 360 | 1,080 |
2 Only subscription level policies are applicable.
Throttling limits for Virtual Machine Scale Set Virtual Machines
API requests for Virtual Machine Scale Set Virtual Machines are categorized into 3 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:
Policy category | REST APIs | Resource Level | Resource Level | Subscription Level | Subscription Level |
---|---|---|---|---|---|
Bucket refill rate (Per Min) |
Maximum Bucket capacity (Per Min) |
Bucket refill rate (Per Min) |
Maximum Bucket capacity (Per Min) |
||
Update scale set VMs (Update existing VMs in a scale set) |
Start Restart Reimage ReimageAll Update SimulateEviction Extensions- Create Or Update RunCommands - Create Or Update RunCommands - Update |
4 | 12 | 500 | 1,500 |
Delete scale set VMs (Delete scale set VMs) |
Delete PowerOff Deallocate Extensions- Delete RunCommands - Delete |
4 | 12 | 500 | 1,500 |
Get scale set VMs (Get information on scale set VMs) |
Get GetInstance View Extensions- Get RunCommands - Get RetrieveBoot Diagnostics Data |
12 | 36 | 2,000 | 6,000 |
Troubleshooting guidelines
In case users are still facing challenges due to Compute throttling, refer to Troubleshooting throttling errors in Azure - Virtual Machines. It has details on how to troubleshoot throttling issues, and best practices to avoid being throttled.
FAQs
Is there any action required from users?
Users don’t need to change anything in their configuration or workloads. All existing APIs continue to work as is.
What benefits do the throttling policies provide?
The throttling policies offer several benefits:
All Compute resources have a uniform window of 1 min. Users can successfully invoke API calls, 1 min after getting throttled.
No single resource can use up all the limits under a subscription as limits are defined at resource level.
Microsoft Compute is introducing a new algorithm, Token Bucket Algorithm, for determining the limits. The algorithm provides extra buffer to the customers, while making high number of API requests.
Does the customer get an alert when they're about to reach their throttling limits?
As part of every response, Microsoft Compute returns x-ms-ratelimit-remaining-resource which can be used to determine the throttling limits against the policies. A list of applicable throttling policies is returned as a response to Call rate informational headers.