Azure AI Model Inference API | Azure AI Foundry

[アーティクル]
02/07/2025

重要

この記事で "(プレビュー)" と付記されている項目は、現在、パブリックプレビュー段階です。このプレビューはサービスレベルアグリーメントなしで提供されており、運用環境ではお勧めしません。特定の機能はサポート対象ではなく、機能が制限されることがあります。詳しくは、Microsoft Azure プレビューの追加使用条件に関するページをご覧ください。

Azure AI Model Inference は、基本モデルの共通の機能セットを公開する API であり、開発者がさまざまなモデルセットからの予測を一定かつ一貫した方法で利用するために使用できます。開発者は、使用している基になるコードを変更することなく、Azure AI Foundry ポータルにデプロイされたさまざまなモデルと対話できます。

メリット

言語モデルなどの基本モデルは、近年、実に顕著な進歩を遂げています。これらの進歩は、自然言語処理やコンピュータービジョンなど、さまざまな分野に革命をもたらしており、チャットボット、仮想アシスタント、言語翻訳サービスなどのアプリケーションを有効にしています。

基本モデルは特定のドメインに優れていますが、一定の機能セットがありません。一部のモデルは特定のタスクにより適しており、同じタスク全体であっても、モデルによっては、問題に 1 つの方法でアプローチするものもあれば、別の方法でアプローチするものもあります。開発者は、以下を可能にするために適切なジョブに適切なモデルを使用し、この多様性の利点を得ることができます。

特定のダウンストリームタスクのパフォーマンスを向上させる。
より簡単なタスクにより効率的なモデルを使用する。
特定のタスクでより迅速に実行できるより小さなモデルを使用する。
複数のモデルを作成してインテリジェントなエクスペリエンスを開発する。

基本モデルを統一した方法で利用することで、開発者は移植性を犠牲にしたり、基となるコードを変更したりすることなく、これらすべての利点を実現できます。

可用性

Azure AI Model Inference API は次のモデル/システムで使用できます。

サーバーレス API エンドポイントにデプロイされたモデル:

Cohere Embed V3 ファミリのモデル
Cohere Command R ファミリのモデル
Meta Llama 2 チャットファミリのモデル
Meta Llama 3 instruct ファミリのモデル
Mistral-Small
Mistral-Large
Jais ファミリのモデル
Jamba ファミリのモデル
Phi-3 ファミリと Phi-4 ファミリのモデル
DeepSeek-R1 ファミリのモデル

マネージド推論へのモデルのデプロイ:

Meta Llama 3 instruct ファミリのモデル
モデルのPhi-3 ファミリと Phi-4 ファミリ
モデルの Mistral ファミリと Mixtral ファミリ

Azure AI サービスで Azure AI モデル推論にデプロイされたモデル:

サポートされているモデルに関する記事を参照してください。

この API は、Azure OpenAI モデルのデプロイと互換性があります。

Note

Azure AI モデル推論 API は、2024 年 6 月 24 日以降にデプロイされたモデルのマネージド推論 (マネージドオンラインエンドポイント) で利用できます。 API を前進させるために、モデルがその日付より前にデプロイされている場合は、エンドポイントを再デプロイします。

機能

次のセクションでは、この API で公開される機能の一部について説明します。この API の完全な仕様については、リファレンスセクションを参照してください。

モダリティ

この API は、開発者が次のモダリティに対して予測を使用する方法を示します。

情報の取得: エンドポイントの下にデプロイされたモデルに関する情報を返します。
テキスト埋め込み: 入力テキストを表す埋め込みベクトルを作成します。
チャットの入力候補: 指定されたチャット会話のモデル応答を作成します。
画像埋め込み: 入力テキストと画像を表す埋め込みベクトルを作成します。

推論 SDK のサポート

選択した言語で合理化された推論クライアントを使用して、Azure AI モデル推論 API を実行しているモデルからの予測を使用できます。

重要

Azure AI モデル推論エンドポイント (プレビュー) を使用する場合、接続先のベース URL は https://<resource-name>.services.ai.azure.com/models形式になります。パラメーター endpoint でこの URL を使用します。 REST API を使用する場合は、使用するモダリティに追加する必要があるベース URL などです。 Azure AI モデル推論エンドポイントの使用方法に関する記事を参照してください。

pip のように、パッケージマネージャーを使用してパッケージ azure-ai-inference をインストールします。

pip install azure-ai-inference

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

Entra ID をサポートするエンドポイントを使用している場合は、次のようにクライアントを作成できます。

import os
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

model = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=DefaultAzureCredential(),
)

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

npm を使用してパッケージ @azure-rest/ai-inference をインストールします。

npm install @azure-rest/ai-inference

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

Microsoft Entra ID をサポートするエンドポイントの場合は、次のようにクライアントを作成できます。

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { DefaultAzureCredential } from "@azure/identity";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new DefaultAzureCredential()
);

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

次のコマンドを使用して Azure AI 推論ライブラリをインストールします:

dotnet add package Azure.AI.Inference --prerelease

Microsoft Entra ID (旧称 Azure Active Directory) をサポートするエンドポイントの場合は、Azure.Identity パッケージをインストールします:

dotnet add package Azure.Identity

次の名前空間をインポートします。

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Microsoft Entra ID (旧称 Azure Active Directory) をサポートするエンドポイントの場合:

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new DefaultAzureCredential(includeInteractiveCredentials: true)
);

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

リファレンスセクションを活用して、API の設計と使用可能なパラメーターを調べることができます。たとえば、チャット入力候補のリファレンスセクションでは、ルート /chat/completions を使用し、チャット形式の指示に基づいて予測を生成する方法について詳しく説明しています。

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

機能拡張

Azure AI Model Inference API では、モデルがサブスクライブできる一連のモダリティとパラメーターが指定されます。ただし、一部のモデルには、API によって示される機能が追加されている場合があります。このような場合、この API を使用すると、開発者はそれらをペイロードに追加のパラメーターとして渡すことができます。

ヘッダー extra-parameters: pass-through を設定することで、この API は不明なパラメーターを基になるモデルに直接渡そうとします。モデルでそのパラメーターを処理できる場合、要求は完了します。

次の例は、Azure AI Model Inference API で指定されていない、Mistral-Large でサポートされる safe_prompt パラメーターを渡す要求を示しています。

from azure.ai.inference.models import SystemMessage, UserMessage

response = model.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    model_extras={
        "safe_mode": True
    }
)

print(response.choices[0].message.content)

ヒント

Azure AI 推論 SDK を使用している場合、model_extras を使用すると、extra-parameters: pass-through によって要求が自動的に構成されます。

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "How many languages are in the world?" },
];

var response = await client.path("/chat/completions").post({
    "extra-parameters": "pass-through",
    body: {
        messages: messages,
        safe_mode: true
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } },
};

response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough);
Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
extra-parameters: pass-through

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "text" },
    "safe_prompt": true
}

Note

extra-parameters の既定値は error であり、ペイロードに追加のパラメーターが指定された場合にエラーが返されます。また、要求で不明なパラメーターを削除するように extra-parameters: drop を設定することもできます。この機能は、モデルではサポートされないことがわかっている追加のパラメーターを使用して要求を送信することになったが、この要求をどうしても完了する必要がある場合に使用します。この一般的な例では、seed パラメーターを示しています。

さまざまな機能セットを持つモデル

Azure AI Model Inference API は一般的な機能セットを示しますが、各モデルでそれらを実装するかどうかを決めることができます。モデルで特定のパラメーターをサポートできない場合は、特定のエラーが返されます。

次の例は、パラメーター reponse_format を示し、JSON 形式で応答を要求するチャット入力候補要求の応答を示しています。この例では、モデルでそのような機能はサポートされていないため、エラー 422 がユーザーに返されます。

import json
from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormatJSON
from azure.core.exceptions import HttpResponseError

try:
    response = model.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="How many languages are in the world?"),
        ],
        response_format=ChatCompletionsResponseFormatJSON()
    )
except HttpResponseError as ex:
    if ex.status_code == 422:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "detail" in response:
            for offending in response["detail"]:
                param = ".".join(offending["loc"])
                value = offending["input"]
                print(
                    f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
                )
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are a helpful assistant" },
        { role: "user", content: "How many languages are in the world?" },
    ];
    
    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
            response_format: { type: "json_object" }
        }
    });
}
catch (error) {
    if (error.status_code == 422) {
        var response = JSON.parse(error.response._content)
        if (response.detail) {
            for (const offending of response.detail) {
                var param = offending.loc.join(".")
                var value = offending.input
                console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
            }
        }
    }
    else 
    {
        throw error
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are a helpful assistant"),
            new ChatRequestUserMessage("How many languages are in the world?"),
        },
        ResponseFormat = new ChatCompletionsResponseFormatJSON()
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.Status == 422)
    {
        Console.WriteLine($"Looks like the model doesn't support a parameter: {ex.Message}");
    }
    else
    {
        throw;
    }
}

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "json_object" },
}

回答

{
    "status": 422,
    "code": "parameter_not_supported",
    "detail": {
        "loc": [ "body", "response_format" ],
        "input": "json_object"
    },
    "message": "One of the parameters contain invalid values."
}

ヒント

プロパティ details.loc を調べて、問題のあるパラメーターの場所を把握し、details.input を調べて、要求で渡された値を確認できます。

コンテンツの安全性

Azure AI モデル推論 API は、Azure AI Content Safety をサポートしています。 Azure AI Content Safety をオンにしてデプロイを使用する場合、入力と出力は、有害なコンテンツの出力を検出して防ぐことを目的とした一連の分類モデルを通過します。コンテンツフィルタリング (プレビュー) システムは、入力プロンプトと出力される入力候補の両方で、有害な可能性があるコンテンツ特有のカテゴリを検出し、アクションを実行します。

次の例は、コンテンツの安全性をトリガーしたチャット入力候補要求に対する応答を示しています。

from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
from azure.core.exceptions import HttpResponseError

try:
    response = model.complete(
        messages=[
            SystemMessage(content="You are an AI assistant that helps people find information."),
            UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
        ]
    )

    print(response.choices[0].message.content)

except HttpResponseError as ex:
    if ex.status_code == 400:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "error" in response:
            print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
        else:
            raise ex
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are an AI assistant that helps people find information." },
        { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
    ]

    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
        }
    });
    
    console.log(response.body.choices[0].message.content)
}
catch (error) {
    if (error.status_code == 400) {
        var response = JSON.parse(error.response._content)
        if (response.error) {
            console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`)
        }
        else
        {
            throw error
        }
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
            new ChatRequestUserMessage(
                "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
            ),
        },
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.ErrorCode == "content_filter")
    {
        Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
    }
    else
    {
        throw;
    }
}

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
    }
    ],
    "temperature": 0,
    "top_p": 1,
}

回答

{
    "status": 400,
    "code": "content_filter",
    "message": "The response was filtered",
    "param": "messages",
    "type": null
}

作業の開始

Azure AI モデル推論 API は現在、サーバーレス API エンドポイントやマネージドオンラインエンドポイントとしてデプロイされた特定のモデルでサポートされています。サポート対象のモデルのいずれかをデプロイし、まったく同じコードを使用して予測を使用します。

クライアントライブラリ azure-ai-inference では、Azure AI Foundry と Azure Machine Learning スタジオによってデプロイされた AI モデルの推論 (チャット入力候補を含む) が行われます。サーバーレス API エンドポイントとマネージドコンピューティングエンドポイント (旧称マネージドオンラインエンドポイント) がサポートされています。

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

クライアントライブラリ @azure-rest/ai-inference では、Azure AI Foundry と Azure Machine Learning スタジオによってデプロイされた AI モデルの推論 (チャット入力候補を含む) が行われます。サーバーレス API エンドポイントとマネージドコンピューティングエンドポイント (旧称マネージドオンラインエンドポイント) がサポートされています。

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

クライアントライブラリ Azure.Ai.Inference では、Azure AI Foundry と Azure Machine Learning スタジオによってデプロイされた AI モデルの推論 (チャット入力候補を含む) が行われます。サーバーレス API エンドポイントとマネージドコンピューティングエンドポイント (旧称マネージドオンラインエンドポイント) がサポートされています。

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

次の方法で共有

Azure AI Model Inference API | Azure AI Foundry

メリット

可用性

機能

モダリティ

推論 SDK のサポート

機能拡張

さまざまな機能セットを持つモデル

コンテンツの安全性

作業の開始

フィードバック

その他のリソース