Emit metrics for consumption of large language model tokens

APPLIES TO: All API Management tiers

The llm-emit-token-metric policy sends metrics to Application Insights about consumption of large language model (LLM) tokens through LLM APIs. Token count metrics include: Total Tokens, Prompt Tokens, and Completion Tokens.

Note

Currently, this policy is in preview.

Note

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Supported models

Use the policy with LLM APIs added to Azure API Management that are available through the Azure AI Model Inference API.

Prerequisites

Policy statement

<llm-emit-token-metric
        namespace="metric namespace" >      
        <dimension name="dimension name" value="dimension value" />
        ...additional dimensions...
</llm-emit-token-metric>

Attributes

Attribute Description Required Default value
namespace A string. Namespace of metric. Policy expressions aren't allowed. No API Management
value Value of metric expressed as a double. Policy expressions are allowed. No 1

Elements

Element Description Required
dimension Add one or more of these elements for each dimension included in the metric. Yes

dimension attributes

Attribute Description Required Default value
name A string or policy expression. Name of dimension. Yes N/A
value A string or policy expression. Value of dimension. Can only be omitted if name matches one of the default dimensions. If so, value is provided as per dimension name. No N/A

Default dimension names that may be used without value

  • API ID
  • Operation ID
  • Product ID
  • User ID
  • Subscription ID
  • Location
  • Gateway ID

Usage

Usage notes

  • This policy can be used multiple times per policy definition.
  • You can configure at most 10 custom dimensions for this policy.
  • Where available, values in the usage section of the response from the LLM API are used to determine token metrics.
  • Certain LLM endpoints support streaming of responses. When stream is set to true in the API request to enable streaming, token metrics are estimated.

Example

The following example sends LLM token count metrics to Application Insights along with User ID, Client IP, and API ID as dimensions.

<policies>
  <inbound>
      <llm-emit-token-metric
            namespace="MyLLM">   
            <dimension name="User ID" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" />
        </llm-emit-token-metric> 
  </inbound>
  <outbound>
  </outbound>
</policies>

For more information about working with policies, see: