Get optical character recognition (OCR) insights

Article
09/03/2024

Optical character recognition (OCR)

OCR extracts text from images like pictures, street signs and products in media files to create insights.

OCR extracts insights from printed and handwritten text in over 50 languages, including from an image with text in multiple languages. For more information, see OCR supported languages.

For more information about OCR, see OCR technology.

OCR use cases

Deep searching media footage for images with signposts, street names or car license plates, for example, in law enforcement.
Extracting text from images in media files and then translating it into multiple languages in labels for accessibility, for example in media or entertainment.
Detecting brand names in images and tagging them for translation purposes, for example in advertising and branding.
Extracting text in images that is then automatically tagged and categorized for accessibility and future usage, for example to generate content at a news agency.
Extracting text in warnings in online instructions and then translating the text to comply with local standards, for example, e-learning instructions for using equipment.

View the insight JSON with the web portal

Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.

Select the Library tab.
Select media you want to work with.
Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
Look for the key pair described in the example response.

Use the API

Use the Get Video Index request. We recommend passing &includeSummarizedInsights=false.
Look for the key pairs described in the example response.

Example response

    "ocr": [
        {
          "id": 1,
          "text": "2017 Ruler",
          "confidence": 0.4365,
          "left": 901,
          "top": 3,
          "width": 80,
          "height": 23,
          "angle": 0,
          "language": "en-US",
          "instances": [
            {
              "adjustedStart": "0:00:45.5",
              "adjustedEnd": "0:00:46",
              "start": "0:00:45.5",
              "end": "0:00:46"
            },
            {
              "adjustedStart": "0:00:55",
              "adjustedEnd": "0:00:55.5",
              "start": "0:00:55",
              "end": "0:00:55.5"
            }
          ]
        },
        {
          "id": 2,
          "text": "2017 Ruler postppu - PowerPoint",
          "confidence": 0.4712,
          "left": 899,
          "top": 4,
          "width": 262,
          "height": 48,
          "angle": 0,
          "language": "en-US",
          "instances": [
            {
              "adjustedStart": "0:00:44.5",
              "adjustedEnd": "0:00:45",
              "start": "0:00:44.5",
              "end": "0:00:45"
            }
          ]
        }

Important

It is important to read the transparency note overview for all VI features. Each insight also has transparency notes of its own:

OCR notes

Video Indexer has an OCR limit of 50,000 words per indexed video. Once the limit is reached, no additional OCR results are generated.
Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the image, low quality images might affect the detected insights.
Carefully consider when using for law enforcement. OCR might misread or not detect parts of the text. To ensure fair and high-quality VI determinations, combine OCR-based automation with human oversight.
When extracting handwritten text, avoid using the OCR results of signatures that are hard to read for both humans and machines. A better way to use OCR is to use it for detecting the presence of a signature for further analysis.
Don't use OCR for decisions that might have serious adverse impacts to individuals or groups. Machine learning models that extract text can result in undetected or incorrect text output. Decisions based on incorrect output could have serious adverse impacts that must be avoided. You should always to include human review of decisions that have the potential for serious impacts on individuals.

OCR components

During the OCR procedure, text images in a media file are processed, as follows:

Component	Definition
Source file	The user uploads the source file for indexing.
Read model	Images are detected in the media file and text, then extracted and analyzed by Azure AI services.
Get read results model	The output of the extracted text is displayed in a JSON file.
Confidence value	The estimated confidence level of each word is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Sample code

See all samples for VI

Share via