In this article, you learn to manage resource usage in a deployment by configuring autoscaling based on metrics and schedules. The autoscale process lets you automatically run the right amount of resources to handle the load on your application. Online endpoints in Azure Machine Learning support autoscaling through integration with the autoscale feature in Azure Monitor.
Azure Monitor autoscale allows you to set rules that trigger one or more autoscale actions when conditions of the rules are met. You can configure metrics-based scaling (such as CPU utilization greater than 70%), schedule-based scaling (such as scaling rules for peak business hours), or a combination of the two. For more information, see Overview of autoscale in Microsoft Azure.
You can currently manage autoscaling by using the Azure CLI, the REST APIs, Azure Resource Manager, the Python SDK, or the browser-based Azure portal.
To use autoscale, the role microsoft.insights/autoscalesettings/write must be assigned to the identity that manages autoscale. You can use any built-in or custom roles that allow this action. For general guidance on managing roles for Azure Machine Learning, see Manage users and roles. For more on autoscale settings from Azure Monitor, see Microsoft.Insights autoscalesettings.
To use the Python SDK to manage the Azure Monitor service, install the azure-mgmt-monitor package with the following command:
pip install azure-mgmt-monitor
Define autoscale profile
To enable autoscale for an online endpoint, you first define an autoscale profile. The profile specifies the default, minimum, and maximum scale set capacity. The following example shows how to set the number of virtual machine (VM) instances for the default, minimum, and maximum scale capacity.
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values for your subscription, workspace, and resource group multiple times, run this code:
az account set --subscription <subscription ID>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>
Set the endpoint and deployment names:
# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue
Get the Azure Resource Manager ID of the deployment and endpoint:
# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule
import random
import datetime
Define variables for the workspace, endpoint, and deployment:
# Set a unique name for autoscale settings for this deployment. The following code appends a random number to create a unique name.
autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}"
mon_client.autoscale_settings.create_or_update(
resource_group,
autoscale_settings_name,
parameters = {
"location" : endpoint.location,
"target_resource_uri" : deployment.id,
"profiles" : [
AutoscaleProfile(
name="my-scale-settings",
capacity={
"minimum" : 2,
"maximum" : 5,
"default" : 2
},
rules = []
)
]
}
)
In the list of available endpoints, select the endpoint to configure:
On the Details tab for the selected endpoint, select Configure auto scaling:
For the Choose how to scale your resources option, select Custom autoscale to begin the configuration.
For the Default scale condition option, configure the following values:
Scale mode: Select Scale based on a metric.
Instance limits > Minimum: Set the value to 2.
Instance limits > Maximum: Set the value to 5.
Instance limits > Default: Set the value to 2.
Leave the configuration pane open. In the next section, you configure the Rules settings.
Create scale-out rule based on deployment metrics
A common scale-out rule is to increase the number of VM instances when the average CPU load is high. The following example shows how to allocate two more nodes (up to the maximum) if the CPU average load is greater than 70% for 5 minutes:
az monitor autoscale rule create \
--autoscale-name $AUTOSCALE_SETTINGS_NAME \
--condition "CpuUtilizationPercentage > 70 avg 5m" \
--scale out 2
The rule is part of the my-scale-settings profile, where autoscale-name matches the name portion of the profile. The value of the rule condition argument indicates the rule triggers when "The average CPU consumption among the VM instances exceeds 70% for 5 minutes." When the condition is satisfied, two more VM instances are allocated.
This rule refers to the last 5-minute average of the CPUUtilizationpercentage value from the arguments metric_name, time_window, and time_aggregation. When the value of the metric is greater than the threshold of 70, the deployment allocates two more VM instances.
Update the my-scale-settings profile to include this rule:
The following steps continue with the autoscale configuration.
For the Rules option, select the Add a rule link. The Scale rule page opens.
On the Scale rule page, configure the following values:
Metric name: Select CPU Utilization Percentage.
Operator: Set to Greater than.
Metric threshold: Set the value to 70.
Duration (minutes): Set the value to 5.
Time grain statistic: Select Average.
Operation: Select Increase count by.
Instance count: Set the value to 2.
Select Add to create the rule:
Leave the configuration pane open. In the next section, you adjust the Rules settings.
Create scale-in rule based on deployment metrics
When the average CPU load is light, a scale-in rule can reduce the number of VM instances. The following example shows how to release a single node down to a minimum of two, if the CPU load is less than 30% for 5 minutes.
The following steps adjust the Rules configuration to support a scale in rule.
For the Rules option, select the Add a rule link. The Scale rule page opens.
On the Scale rule page, configure the following values:
Metric name: Select CPU Utilization Percentage.
Operator: Set to Less than.
Metric threshold: Set the value to 30.
Duration (minutes): Set the value to 5.
Time grain statistic: Select Average.
Operation: Select Decrease count by.
Instance count: Set the value to 1.
Select Add to create the rule:
If you configure both scale-out and scale-in rules, your rules look similar to the following screenshot. The rules specify that if average CPU load exceeds 70% for 5 minutes, two more nodes should be allocated, up to the limit of five. If CPU load is less than 30% for 5 minutes, a single node should be released, down to the minimum of two.
Leave the configuration pane open. In the next section, you specify other scale settings.
Create scale rule based on endpoint metrics
In the previous sections, you created rules to scale in or out based on deployment metrics. You can also create a rule that applies to the deployment endpoint. In this section, you learn how to allocate another node when the request latency is greater than an average of 70 milliseconds for 5 minutes.
If you want to use other metrics in code to set up autoscale rules by using the Azure CLI or the SDK, see the table in Available metrics.
Create scale rule based on schedule
You can also create rules that apply only on certain days or at certain times. In this section, you create a rule that sets the node count to 2 on the weekends.
To disable an autoscale profile in use, select Manual scale, and then select Save.
To enable an autoscale profile, select Custom autoscale. The studio lists all recognized autoscale profiles for the workspace. Select a profile and then select Save to enable.
Delete resources
If you're not going to use your deployments, delete the resources with the following steps.
# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"
# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait