Form Recognizer: PDFs with own temporary fonts are not recognized correctly
As it seems, Form Recognizer does not correctly recognize PDF files created with custom temporary fonts.
For example, I have a file that was created with a custom font. In the PDF file, the text looks like this:
But the detection provides this result:
?hZkd]’ Jej[dj_WbWki]b[Y^ kdZ iedij][ MY^kjpcWydW^c[d
This is also the same result when I copy this text from the PDF file and paste it into a text editor.
As far as I can tell, in this case the recognition does not run over the recognition of the text in the image, but over the plain text contained in the PDF file which, because of the font, is not recognized correctly.
Do you know if there are any plans in future releases to recognize text with unknown fonts if they are included in the PDF file?