使用 Azure AI 视频索引器创建语言模型提示内容

项目
10/14/2024

本文介绍如何使用 Azure AI 视频索引器创建语言模型提示内容。

先决条件

Azure AI 视频索引器帐户。

在浏览器中打开网页

如果已打开所需的网页，则更容易遵循这些说明。将以下参数复制并粘贴到你喜欢的文本编辑器中。下面是一个入门列表：

VI 帐户 ID：
视频文件 ID：
访问令牌：

在一个选项卡或窗口中打开Azure 门户，然后登录。
1. 导航到帐户 ID 的 Azure AI 视频索引器帐户页。
2. 导航到 Azure OpenAI 帐户页
  1. 从菜单中选择“模型部署”，然后选择“管理部署”。此时会打开 Azure OpenAI Studio。复制要使用的部署名称。
在另一个选项卡或窗口中打开 Azure AI 视频索引器 Web 门户并登录。
1. 导航到库页面，选择视频，然后右键单击该视频文件 ID。
在另一个选项卡或窗口中打开 Azure AI 视频索引器 API 并登录。

生成访问令牌

在Azure 门户中生成访问令牌：

导航到 Azure AI 视频索引器帐户。
从菜单中选择“管理”，然后选择“管理 API”。
在“权限类型”下拉列表中，选择“参与者”。
在“范围”下拉列表中，选择“帐户”。
然后选择“生成” 。生成访问令牌。
将访问令牌复制到剪贴板。

创建提示内容

创建可供文本编辑器使用的视频摘要所需的所有参数。

在 API 中或 Azure AI 视频索引器 API 页上，搜索提示。
选择“创建提示内容”旁边的“试用”按钮。此时会打开 API 参数窗格。
从位置下拉列表中选择位置。
在 accountId 字段中复制并粘贴 VI 帐户 ID。
复制视频文件 ID 并将其粘贴到 videoId 字段中。
从 modelName 下拉列表中选择想要使用 Llama2、Phi2、Phi3、Phi3_5、GPT3_5Turbo、GPT4、GPT4O、GPT4OMini 的语言模型。
从样式下拉列表中选择提示样式“完整”或“汇总”。
将剪贴板上的访问令牌粘贴到 accessToken 字段中。
选择Send。

如果请求没有任何问题，响应会显示 HTTP/1.1 202 已接受。

注意

对于与摘要相关的提示，建议选择“汇总”样式。对于搜索任务，包括回答有关视频的特定问题，我们建议使用搜索样式。

检查作业状态

提示作业需要几分钟才能完成。如果要检查作业状态，可以将作业 ID 与作业状态请求一起使用。

获取提示内容

选择“ 获取提示内容”。
在相应的字段中输入帐户 ID、视频 ID、摘要 ID 和访问令牌。
选择Send。

提示内容将出现在响应中。

示例响应


HTTP/1.1 200 OK

cache-control: max-age=18000, private
content-type: application/json; charset=utf-8
expires: Tue, 07 May 2024 04:57:23 GMT

{
    "partition": null,
    "name": "satya_nadella_build_keynote_2018",
    "sections": [{
        "id": 0,
        "start": "0:00:00.050033",
        "end": "0:02:06.700033",
        "content": "[Video title] satya_nadella_build_keynote_2018\n[Known people] Satya Nadella\n[Visual labels] indoor, 
        audience, clothing, person, presentation, human face, man, wall, footwear, glasses, auditorium, microphone\n[OCR] Satya, 
        Nadella, 0:01 / 4:59, Nadel, 8-8, Microsoft, oft, Satya Nadella, Chief Executive Officer, Micr, Microsoft Build, 
        Opportunity & Responsibility, Mi, Micros, croso\n[Transcript] Please welcome Satya Nadella.\nGood morning and welcome to 
        Bill 2018.\nWelcome to Seattle.\nIt's fantastic to see you all back here.\nYou know, this morning I got up and I was 
        reading the news and I hear Bill Gates is talking about stock and he's talking about the Apple stock.\nAnd I said, wow, in 
        the 30 years that at least I've known Bill, I've never seen him talk about stock.\nBut today must be a new day for sure 
        when you hear Bill talk about Apple stock.\nSo that's the new Microsoft for you.\nYou know last year we talked about 
        opportunity and responsibility and both those topics have been so far amplified.\nIt's unimaginable.\nIn fact, for the 
        first time here, last year is when I started talking about the Intelligent Edge, and 12 months after it's everywhere.\nIn 
        fact, at this conference it's going to be something that we will unpack in great detail.\nThe platform advances are pretty 
        amazing, but most importantly, it's the developers who are pushing these platform advances.\nSo to see the Intelligent 
        edge go from some sort of a conceptual frame to this real thing that's shaping the cloud is stunning."
    }, {
        "id": 1,
        "start": "0:02:06.700033",
        "end": "0:04:38.916667",
        "content": "[Video title] satya_nadella_build_keynote_2018\n[Known people] Satya Nadella\n[Detected objects] car\n[Visual 
        labels] indoor, audience, clothing, person, presentation, human face, man, footwear, glasses, outdoor\n[OCR] Microsoft, 
        Opportunity & Responsibility, ponsibility, ros, SPEA., 4:03 / 4:59\n[Transcript] Last year, we also talked about this 
        notion of responsibility and that none of us wanted to see a future that Huxley imagined or Orville imagined, and that's 
        now become a mainstream topic of discussion.\nAnd so I was thinking about the historical parallels, where there was this 
        much change, this kind of opportunity, this kind of tumultuous discussion.\nAnd I was reminded of a book that I read maybe 
        three years ago by Robert Gordon, The Rise and Fall of American Productivity or American Growth.\nAnd in there he in fact 
        talks about the industrial revolution and even contrasts it with the digital revolution.\nHe gives Peace C credit for the 
        last time digital technology showed up in our productivity stats, which is nice.\nBut in general he sort of talks about 
        what an amazing revolution the industrial revolution was in terms of its broad sectoral impact and productivity and growth.
        \nThis is a picture of New York City probably 1905, I'm told.\nFlat iron building and what you see is horse carriages.    
        \nAnd if you go to the next picture, this is 20 years after and you see all the artefacts of the industrial revolution and 
        it's diffusion.\nYou see the automobiles.\nThese buildings now are beginning to have sewage systems, drainage, air 
        conditioning is coming, Radios, telephones, high rises.\nIt's pretty amazing."
    }]
}

使用关键帧直观提示大型语言模型

Prompt 内容请求支持可以在提示中使用视觉输入的语言模型。选择 GPT-4V 模型时，可以包含关键帧作为提供给模型的提示的一部分。提示内容响应中返回的帧表示视频的关键帧。对于提供有限见解的视频，或者视频中没有脚本时，建议使用此功能。

创建并发送提示内容请求。
如上所述，提示的文本内容位于 JSON 响应中。
JSON 响应的“frames”部分中的每个字符串都是关键帧的 ID。
使用 ID 通过“获取视频项目下载 URL”请求下载关键帧项目。（还可以从 VI Web 门户下载项目。
拥有文本内容和关键帧项目后，可以将它们合并为所选 AI 模型的提示。

通过