Optical Character Recognition (U-SQL)
Summary
OcrExtractor
cognitive function detects and extract text in an image. It analyze images to detect embedded text and generate character streams.
Arguments
OcrExtractor(
string imgCol = "ImgData", string txtCol = "Text")
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- Ensure you have installed the cognitive assemblies, see Registering Cognitive Extensions in U-SQL for more information.
- The scripts can be executed locally if you first download the assemblies locally, see Enabling U-SQL Advanced Analytics for Local Execution for more information. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- You will need images accessible to you ADLA or Local account.
- The examples utillize the table
myImages
from the example Load images to a table.
Extract text from the image using OCR Extractor
REFERENCE ASSEMBLY ImageCommon;
REFERENCE ASSEMBLY ImageOcr;
@ocrs =
PROCESS dbo.myImages
PRODUCE FileName,
Text string
READONLY FileName
USING new Cognition.Vision.OcrExtractor();
OUTPUT @ocrs
TO "/ReferenceGuide/Cognition/Vision/OcrExtractor.txt"
USING Outputters.Tsv(outputHeader: true);