Thai Language Support for Azure Cognitive Search Image OCR Skill.

Mullika Phanhong 20 Reputation points Microsoft Employee
2023-10-31T04:55:20.59+00:00

For Azure Cognitive Search, extracting text from images can be done by adding the OCR skill powered by Azure AI Vision OCR with does not currently support Thai language to be extracted from printed text. Is there any roadmap for OCR to support TH?

In addition, instead of using built-in skill, we explore an option to create a custom skill by implementing a custom API powered by Azure Document Intelligence Read API which now supports Thai following this tutorial https://video2.skills-academy.com/en-us/training/modules/build-form-recognizer-custom-skill-for-azure-cognitive-search/. However, the customer wants to know if there is a roadmap to natively integrate the Document Intelligence capabilities as built-in skills (like Azure AI services) without a need to implement an external API endpoint.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
834 questions
Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
233 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,508 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,583 questions
{count} votes

Accepted answer
  1. brtrach-MSFT 15,786 Reputation points Microsoft Employee
    2023-11-07T23:56:11.0366667+00:00

    @Mullika Phanhong Thank you for reaching out regarding Thai support within OCR built-in.

    Historically the process is to have Azure AI Vision API (to be used with custom skillsets) release their update. In this instance, the current production version is 3.2, which does not include support for Thai. Version 4.0 is currently in preview but does include support for Thai. Version 4.0 is expected to be moved to production very soon in the coming weeks.

    The OCR built-in team verified though that they cannot consider adopting a new version until Azure AI Vision has released their version into production. So once 4.0 releases as an API in the coming weeks, they will then need to research, agree to adopt 4.0, and if they agree, develop, test, and likely have a preview and only then a production version. All of this to say, if 4.0 gets accepted into the built-in OCR skillset, it could be a long time until it's available for consumption.

    I am aware that your customer does not want to develop a custom skill set and use an API due to the management tasks that are associated with this approach. If they are wanting Thai support they will likely need to open up to this approach though depending on their project timeline. They can immediately move forward with Azure Document Intelligence read, which has support for Thai printed text, available today or they can wait a few weeks more for Azure AI Vision API 4.0, which will also include support for Thai printed text.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful