Azure Machine Learning でのオンラインエンドポイントの自動スケーリング

[アーティクル]
09/03/2024

適用対象:Azure CLI ml extension v2 (現行)Python SDK azure-ai-ml v2 (現行)

この記事では、メトリックとスケジュールに基づいて自動スケーリングを構成することで、デプロイでのリソースの使用状況を管理する方法について説明します。自動スケーリングプロセスを使用すると、アプリケーションの負荷を処理するために適切な量のリソースを自動的に実行できます。 Azure Machine Learning のオンラインエンドポイントでは、Azure Monitor の自動スケーリング機能との統合による自動スケーリングがサポートされます。

Azure Monitor の自動スケーリングでは、ルールの条件が満たされたときに 1 つ以上の自動スケーリングアクションをトリガーするルールを設定できます。メトリックベースのスケーリング (CPU 使用率が 70% を超える場合など)、スケジュールベースのスケーリング (ピーク営業時間のスケーリングルールなど)、または 2 つの組み合わせを構成できます。詳細については、「Microsoft Azure の自動スケールの概要」を参照してください。

必要に応じてインスタンスを自動スケーリングで追加および削除する方法を示す図。

現在、自動スケーリングは、Azure CLI、REST API、Azure Resource Manager、Python SDK、またはブラウザーベースの Azure portal を使用して管理できます。

前提条件

デプロイされたエンドポイント。詳細については、「オンラインエンドポイントを使用して機械学習モデルをデプロイおよびスコア付けする」を参照してください。
自動スケーリングを使用するには、自動スケーリングを管理する ID にロール microsoft.insights/autoscalesettings/write を割り当てる必要があります。このアクションを許可する組み込みロールまたはカスタムロールを使用できます。 Azure Machine Learning のロールの管理に関する一般的なガイダンスについては、「ユーザーとロールを管理する」を参照してください。 Azure Monitor からの自動スケーリング設定の詳細については、「Microsoft.Insights autoscalesettings」を参照してください。
Python SDK を使用して Azure Monitor サービスを管理するには、次のコマンドを使用して azure-mgmt-monitor パッケージをインストールします。
```
pip install azure-mgmt-monitor
```

自動スケーリングプロファイルを定義する

オンラインエンドポイントに対して自動スケーリングを有効にするには、最初に自動スケーリングプロファイルを定義します。プロファイルは、既定、最小、および最大スケールセットの容量を指定します。次の例では、既定、最小、および最大スケール容量の仮想マシン (VM) インスタンスの数を設定する方法を示します。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

まだ Azure CLI の既定値を設定していない場合は、既定の設定を保存する必要があります。サブスクリプション、ワークスペース、およびリソースグループの値を複数回渡さないようにするには、次のコードを実行します。

az account set --subscription <subscription ID>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>

エンドポイントとデプロイの名前を設定します:

# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue

デプロイとエンドポイントの Azure Resource Manager ID を取得します:

# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`

自動スケーリングプロファイルを作成します:

az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2

Note

詳細については、az monitor の自動スケーリングリファレンスを参照してください。

適用対象: Python SDK azure-ai-ml v2 (現行)

必要なモジュールをインポートします:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule
import random 
import datetime

ワークスペース、エンドポイント、デプロイの変数を定義します。

subscription_id = "<YOUR-SUBSCRIPTION-ID>"
resource_group = "<YOUR-RESOURCE-GROUP>"
workspace = "<YOUR-WORKSPACE>"

endpoint_name = "<YOUR-ENDPOINT-NAME>"
deployment_name = "blue"

Azure Machine Learning および Azure Monitor のクライアントを取得します。

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential, subscription_id, resource_group, workspace
)

mon_client = MonitorManagementClient(
    credential, subscription_id
)

エンドポイントとデプロイオブジェクトを取得します。

deployment = ml_client.online_deployments.get(
    deployment_name, endpoint_name
)

endpoint = ml_client.online_endpoints.get(
    endpoint_name
)

自動スケーリングプロファイルを作成します。

# Set a unique name for autoscale settings for this deployment. The following code appends a random number to create a unique name.
autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}"

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = []
            )
        ]
    }
)

デプロイメトリックに基づいてスケールアウトルールを作成する

一般的なスケールアウトルールは、平均 CPU 負荷が高い場合に VM インスタンスの数を増やすことです。次の例では、CPU の平均負荷が 5 分間で 70% を超える場合に、さらに 2 つのノード (最大値まで) を割り当てる方法を示します。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2

ルールは my-scale-settings プロファイルの一部であり、autoscale-name プロファイルの name 部分と一致します。引数 condition ルールの値は、"VM インスタンス間の平均 CPU 消費量が 5 分間で 70% を超える" 場合にルールがトリガーされたことを示します。条件が満たされた場合、さらに 2 つの VM インスタンスが割り当てられます。

Note

詳細については、 az monitor 自動スケーリング Azure CLI 構文リファレンスを参照してください。

適用対象: Python SDK azure-ai-ml v2 (現行)

ルール定義を作成します。

rule_scale_out = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 2, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

このルールは、引数 metric_name、time_window、および time_aggregation からの CPUUtilizationpercentage 値の最後の 5 分間の平均を参照します。メトリックの値が 70 の threshold より大きい場合、デプロイではさらに 2 つの VM インスタンスが割り当てられます。

my-scale-settings プロファイルを更新して、この規則を含めます。

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out
                ]
            )
        ]
    }
)

デプロイメトリックに基づいてスケールインルールを作成する

平均 CPU 負荷が軽い場合、スケールインルールによって VM インスタンスの数を減らすことができます。次の例は、CPU 負荷が 5 分間 30% 未満の場合に、1 つのノードを少なくとも 2 つまで解放する方法を示しています。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1

適用対象: Python SDK azure-ai-ml v2 (現行)

ルール定義を作成します。

rule_scale_in = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "LessThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 30
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

my-scale-settings プロファイルを更新して、この規則を含めます。

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in
                ]
            )
        ]
    }
)

エンドポイントメトリックに基づいてスケールルールを作成する

前のセクションでは、デプロイメトリックに基づいてスケールインまたはスケールアウトするルールを作成しました。デプロイエンドポイントに適用される規則を作成することもできます。このセクションでは、要求の待機時間が 5 分間の平均 70 ミリ秒を超える場合に、別のノードを割り当てる方法について説明します。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID

適用対象: Python SDK azure-ai-ml v2 (現行)

ルール定義を作成します。

rule_scale_out_endpoint = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="RequestLatency",
        metric_resource_uri = endpoint.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

このルールの metric_resource_uri フィールドは、デプロイではなくエンドポイントを参照するようになります。

my-scale-settings プロファイルを更新して、この規則を含めます。

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in,
                    rule_scale_out_endpoint
                ]
            )
        ]
    }
)

サポートされているメトリックの ID を検索する

コードで他のメトリックを使用して Azure CLI または SDK を使用して自動スケールルールを設定する場合は、「使用可能なメトリック」のテーブルを参照してください。

スケジュールに基づいてスケールルールを作成する

特定の日または特定の時間にのみ適用されるルールを作成することもできます。このセクションでは、週末にノード数を 2 に設定するルールを作成します。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time"

適用対象: Python SDK azure-ai-ml v2 (現行)

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="Default",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 2,
                    "default" : 2
                },
                recurrence = Recurrence(
                    frequency = "Week", 
                    schedule = RecurrentSchedule(
                        time_zone = "Pacific Standard Time", 
                        days = ["Saturday", "Sunday"], 
                        hours = [], 
                        minutes = []
                    )
                )
            )
        ]
    }
)

自動スケーリングを有効または無効にする

特定の自動スケーリングプロファイルを有効または無効にできます。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

az monitor autoscale update \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --enabled false

適用対象: Python SDK azure-ai-ml v2 (現行)

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "enabled" : False
    }
)

リソースを削除する

デプロイを使用しない場合は、次の手順でリソースを削除します。

適用対象: Azure CLI ml 拡張機能 v2 (現行)

# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait

適用対象: Python SDK azure-ai-ml v2 (現行)

mon_client.autoscale_settings.delete(
    resource_group, 
    autoscale_settings_name
)

ml_client.online_endpoints.begin_delete(endpoint_name)

次の方法で共有

Azure Machine Learning でのオンラインエンドポイントの自動スケーリング

前提条件

自動スケーリングプロファイルを定義する

デプロイメトリックに基づいてスケールアウトルールを作成する

デプロイメトリックに基づいてスケールインルールを作成する

エンドポイントメトリックに基づいてスケールルールを作成する

サポートされているメトリックの ID を検索する

スケジュールに基づいてスケールルールを作成する

自動スケーリングを有効または無効にする

リソースを削除する

フィードバック

その他のリソース

次の方法で共有

Azure Machine Learning でのオンライン エンドポイントの自動スケーリング

前提条件

自動スケーリング プロファイルを定義する

デプロイ メトリックに基づいてスケールアウト ルールを作成する

デプロイ メトリックに基づいてスケールイン ルールを作成する

エンドポイント メトリックに基づいてスケール ルールを作成する

サポートされているメトリックの ID を検索する

スケジュールに基づいてスケール ルールを作成する

自動スケーリングを有効または無効にする

リソースを削除する

関連するコンテンツ

フィードバック

その他のリソース

Azure Machine Learning でのオンラインエンドポイントの自動スケーリング

自動スケーリングプロファイルを定義する

デプロイメトリックに基づいてスケールアウトルールを作成する

デプロイメトリックに基づいてスケールインルールを作成する

エンドポイントメトリックに基づいてスケールルールを作成する

スケジュールに基づいてスケールルールを作成する