查询基础模型和外部模型

项目
10/03/2024

本文介绍如何为基础模型和外部模型设置查询请求的格式，并将其发送到模型服务终结点。

有关传统的 ML 或 Python 模型查询请求，请参阅查询自定义模型的服务终结点。

Databricks 模型服务支持基础模型 API 和外部模型，以访问生成式 AI 模型。模型服务使用与 OpenAI 兼容的统一 API 和 SDK 进行查询。这样，就可以跨受支持的云和提供商试验和自定义用于生产环境的生成式 AI 模型。

Mosaic AI 模型服务提供以下选项，可用于向提供基础模型或外部模型服务的终结点发送评分请求：

方法	详细信息
OpenAI 客户端	使用 OpenAI 客户端查询 Mosaic AI 模型服务终结点托管的模型。将提供终结点名称的模型指定为 `model` 输入。支持由基础模型 API 或外部模型提供的聊天、嵌入和完成模型。
Serving UI	在“服务终结点”页面中，选择“查询终结点”。插入 JSON 格式的模型输入数据，然后单击“发送请求”。如果模型记录了输入示例，请使用“显示示例”来加载该示例。
REST API	使用 REST API 调用和查询模型。有关详细信息，请参阅 POST /serving-endpoints/{name}/invocations。有关为多个模型提供服务的终结点的评分请求，请参阅查询终结点背后的单个模型。
MLflow 部署 SDK	使用 MLflow 部署 SDK 的 predict() 函数查询模型。
Databricks Python SDK	Databricks Python SDK 是 REST API 上的一个层。它处理低级别详细信息，例如身份验证，从而更轻松地与模型交互。
SQL 函数	使用 `ai_query` SQL 函数直接从 SQL 调用模型推理。请参阅使用 ai_query() 查询服务的模型。

要求

模型服务终结点。
Databricks 工作区位于受支持的区域中。
- 基础模型 API 区域
- 外部模型区域
若要通过 OpenAI 客户端 REST API 或 MLflow 部署 SDK 发送评分请求，你必须有 Databricks API 令牌。

重要

作为适用于生产场景的安全最佳做法，Databricks 建议在生产期间使用计算机到计算机 OAuth 令牌来进行身份验证。

对于测试和开发，Databricks 建议使用属于服务主体（而不是工作区用户）的个人访问令牌。若要为服务主体创建令牌，请参阅管理服务主体的令牌。

安装包

选择查询方法后，必须先将相应的包安装到群集。

OpenAI 客户端

若要使用 OpenAI 客户端，需要在群集上安装 openai 包。在笔记本或本地终端中运行以下命令：

!pip install openai

仅当在 Databricks Notebook 上安装包时，才需要满足以下条件

dbutils.library.restartPython()

REST API

Databricks Runtime 中提供对服务 REST API 的访问，供机器学习使用。

MLflow 部署 SDK

!pip install mlflow

仅当在 Databricks Notebook 上安装包时，才需要满足以下条件

dbutils.library.restartPython()

Databricks Python SDK

所有使用 Databricks Runtime 13.3 LTS 或更高版本的 Azure Databricks 群集上均已安装了 Databricks SDK for Python。对于使用 Databricks Runtime 12.2 LTS 及更低版本的 Azure Databricks 群集，必须先安装 Databricks SDK for Python。请参阅步骤 1：安装或升级 Databricks SDK for Python。

查询聊天完成模型

下面是查询聊天模型的示例。该示例适用于查询使用模型服务功能（基础模型 API 或外部模型）提供的聊天模型。

有关批量推理示例，请参阅使用基础模型 API 预配吞吐量进行批量推理。

OpenAI 客户端

下面是基础模型 API 按令牌付费终结点提供的 DBRX 指示模型的聊天请求，databricks-dbrx-instruct 在工作区中。

如果要使用 OpenAI 客户端，可将提供终结点名称的模型指定为 model 输入。以下示例假定在计算中安装了 Databricks API 令牌和 openai。还需要 Databricks 工作区实例才能将 OpenAI 客户端连接到 Databricks。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-dbrx-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

REST API

重要

以下示例使用 REST API 参数来查询为基础模型提供服务的终结点。这些参数为公共预览版，定义可能会发生变化。请参阅 POST /service-endpoints/{name}/invocations。

下面是基础模型 API 按令牌付费终结点提供的 DBRX 指示模型的聊天请求，databricks-dbrx-instruct 在工作区中。

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-dbrx-instruct/invocations \

MLflow 部署 SDK

重要

以下示例使用 MLflow 部署 SDK 中的 predict() API。

下面是基础模型 API 按令牌付费终结点提供的 DBRX 指示模型的聊天请求，databricks-dbrx-instruct 在工作区中。


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-dbrx-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks Python SDK

下面是基础模型 API 按令牌付费终结点提供的 DBRX 指示模型的聊天请求，databricks-dbrx-instruct 在工作区中。

此代码必须在工作区的笔记本中运行。请参阅从 Azure Databricks 笔记本内使用 Databricks SDK for Python。

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-dbrx-instruct",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

LangChain

若要使用 LangChain 查询基础模型终结点，可以使用 ChatDatabricks ChatModel 类并指定 endpoint。

以下示例使用 LangChain 中的 ChatDatabricks ChatModel 类来查询基础模型 API 按令牌付费终结点 databricks-dbrx-instruct。

%pip install langchain-databricks

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_databricks import ChatDatabricks

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]

llm = ChatDatabricks(endpoint_name="databricks-dbrx-instruct")
llm.invoke(messages)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。请参阅使用 ai_query() 查询服务的模型。

以下是基础模型 API 按令牌付费终结点提供的 llama-2-70b-chat 聊天请求，databricks-llama-2-70b-chat 在工作区中。

注意

ai_query() 函数不支持为 DBRX 或 DBRX 指示模型提供服务的查询终结点。

SELECT ai_query(
    "databricks-llama-2-70b-chat",
    "Can you explain AI in ten words?"
  )

例如，以下是使用 REST API 时聊天模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

以下是使用 REST API 发出的请求的预期响应格式：

{
  "model": "databricks-dbrx-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

查询嵌入模型

下面是基础模型 API 提供的 bge-large-en 模型的嵌入请求。该示例适用于查询使用模型服务功能（基础模型 API 或外部模型）提供的嵌入模型。

OpenAI 客户端

如果要使用 OpenAI 客户端，可将提供终结点名称的模型指定为 model 输入。以下示例假定已在群集上安装 Databricks API 令牌和 openai。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-bge-large-en",
  input="what is databricks"
)

REST API

重要

以下示例使用 REST API 参数来查询为基础模型和外部模型提供服务的终结点。这些参数为公共预览版，定义可能会发生变化。请参阅 POST /service-endpoints/{name}/invocations。


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-bge-large-en/invocations

MLflow 部署 SDK

重要

以下示例使用 MLflow 部署 SDK 中的 predict() API。


import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-bge-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Databricks Python SDK


from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-bge-large-en",
    input="Embed this sentence!"
)
print(response.data[0].embedding)

LangChain

若要使用 LangChain 中的 Databricks 基础模型 API 模型作为嵌入模型，请导入 DatabricksEmbeddings 类并指定 endpoint 参数，如下所示：

%pip install langchain-databricks

from langchain_databricks import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。请参阅使用 ai_query() 查询服务的模型。


SELECT ai_query(
    "databricks-bge-large-en",
    "Can you explain AI in ten words?"
  )

下面是嵌入模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。


{
  "input": [
    "embedding text"
  ]
}

下面是预期的响应格式：

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

查询文本完成模型

下面是基础模型 API 提供的 databricks-mixtral-8x7b-instruct 模型的完成请求。该示例适用于查询使用模型服务功能（基础模型 API 或外部模型）提供的聊天模型。有关参数和语法，请参阅完成任务。

OpenAI 客户端

如果要使用 OpenAI 客户端，可将提供终结点名称的模型指定为 model 输入。以下示例假定已在群集上安装 Databricks API 令牌和 openai。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="databricks-mixtral-8x7b-instruct",
  prompt="what is databricks",
  temperature=1.0
)

REST API

重要


curl \
 -u token:$DATABRICKS_TOKEN \
 -X POST \
 -H "Content-Type: application/json" \
 -d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-mixtral-8x7b-instruct/invocations

MLflow 部署 SDK

重要

以下示例使用 MLflow 部署 SDK 中的 predict() API。


import os
import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook

os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

completions_response = client.predict(
    endpoint="databricks-mixtral-8x7b-instruct",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)

# Print the response
print(completions_response)

Databricks Python SDK

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-mixtral-8x7b-instruct",
    prompt="Write 3 reasons why you should train an AI model on domain specific data sets."
)
print(response.choices[0].text)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。请参阅使用 ai_query() 查询服务的模型。

SELECT ai_query(
    "databricks-mpt-30b-instruct",
    "Can you explain AI in ten words?"
  )

以下是完成模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。

{
  "prompt": "What is mlflow?",
  "max_tokens": 100,
  "temperature": 0.1,
  "stop": [
    "Human:"
  ],
  "n": 1,
  "stream": false,
  "extra_params":{
    "top_p": 0.9
  }
}

下面是预期的响应格式：

{
  "id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
  "object": "text_completion",
  "created": 1698809382,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
    "text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
    "index": 0,
    "logprobs": null,
    "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 83,
    "total_tokens": 88
  }
}

使用 AI 操场与支持的 LLM 聊天

可以使用 AI 操场与受支持的大语言模型进行交互。 AI 操场是类似聊天的环境，可以在该处测试、提示和比较 Azure Databricks 工作区的 LLM。

AI 操场

通过

查询基础模型和外部模型

要求

安装包

OpenAI 客户端

REST API

MLflow 部署 SDK

Databricks Python SDK

查询聊天完成模型

OpenAI 客户端

REST API

MLflow 部署 SDK

Databricks Python SDK

LangChain

SQL

查询嵌入模型

OpenAI 客户端

REST API

MLflow 部署 SDK

Databricks Python SDK

LangChain

SQL

查询文本完成模型

OpenAI 客户端

REST API

MLflow 部署 SDK

Databricks Python SDK

SQL

使用 AI 操场与支持的 LLM 聊天

其他资源

反馈

其他资源