Document Intelligence Studio read does not correctly read a PDF

Javier Alonso Gutiérrez 0 Reputation points
2024-06-05T15:45:31.7433333+00:00

Hi. I was trying to train a custom extraction model in Document Intelligence Studio, but when analyzing the PDFs files with the data it does not read the text paragraphs correctly.

Trying to isolate the problem, moved to simple Read analysis, but it only reads some portions of the text present in the PDF document and fails to recognize correctly most of the words. Even the language detection fails to work.

If I convert the PDF to JPG it works fine. What could be the problem?

Some samples:

Captura de pantalla 2024-06-05 a las 17.48.06

Captura de pantalla 2024-06-05 a las 17.39.47

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,505 questions
{count} votes