Add and configure models to Azure AI model inference service
You can decide and configure which models are available for inference in the resource's model inference endpoint. When a given model is configured, you can then generate predictions from it by indicating its model name or deployment name on your requests. No further changes are required in your code to use it.
In this article, you learn how to add a new model to the Azure AI model inference service in Azure AI services.
Prerequisites
To complete this article, you need:
- An Azure subscription. If you're using GitHub Models, you can upgrade your experience and create an Azure subscription in the process. Learn more at Upgrade from GitHub Models to Azure AI Models in AI Services.
- An Azure AI services resource. For more information, see Create an Azure AI Services resource.
Add a model
As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration.
You can add all the models you need in the endpoint by using Azure AI Studio for GitHub. In the following example, we add a Mistral-Large
model in the service:
Go to Model catalog section in Azure AI Studio for GitHub.
Scroll to the model you're interested in and select it.
You can review the details of the model in the model card.
Select Deploy.
For models providers that require extra terms of contract, you're asked to accept those terms. For instance, Mistral models ask you to accept other terms. Accept the terms on those cases by selecting Subscribe and deploy.
You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the
model
parameter for request to route to this particular model deployment. This setting allows you to also configure specific names for your models when you attach specific configurations. For instance,o1-preview-safe
for a model with a strict content safety content filter.
Tip
Each model may support different deployments types, providing different data residency or throughput guarantees. See deployment types for more details.
- Use the Customize option if you need to change settings like content filter or rate limiting (if available).
Select Deploy.
Once the deployment completes, the new model will be listed in the page and it's ready to be used.
Use the model
Deployed models in Azure AI services can be consumed using the Azure AI model's inference endpoint for the resource.
To use it:
Get the Azure AI model's inference endpoint URL and keys from the deployment page or the Overview page. If you're using Microsoft Entra ID authentication, you don't need a key.
Use the model inference endpoint URL and the keys from before when constructing your client. The following example uses the Azure AI Inference package:
Install the package
azure-ai-inference
using your package manager, like pip:pip install azure-ai-inference>=1.0.0b5
Warning
Azure AI Services resource requires the version
azure-ai-inference>=1.0.0b5
for Python.Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import os from azure.ai.inference import ChatCompletionsClient from azure.core.credentials import AzureKeyCredential client = ChatCompletionsClient( endpoint=os.environ["AZUREAI_ENDPOINT_URL"], credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]), )
Explore our samples and read the API reference documentation to get yourself started.
When constructing your request, indicate the parameter
model
and insert the model deployment name you created.from azure.ai.inference.models import SystemMessage, UserMessage response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="Explain Riemann's conjecture in 1 paragraph"), ], model="mistral-large" ) print(response.choices[0].message.content)
Tip
When using the endpoint, you can change the model
parameter to any available model deployment in your resource.
Additionally, Azure OpenAI models can be consumed using the Azure OpenAI service endpoint in the resource. This endpoint is exclusive for each model deployment and has its own URL.
Model deployment customization
When creating model deployments, you can configure other settings including content filtering and rate limits. To configure more settings, select the option Customize in the deployment wizard.
Note
Configurations may vary depending on the model you're deploying.