Form Recognizer API V2.1 how can I split fields?

Danicode 131 Reputation points
2020-11-29T21:19:04.163+00:00

Hi, I have an invoice that has the products name column and on the result I get a long list of words, and I cannot turn them back to each row.

An example: I have 2 rows, the first product is "MY FIRST PRODUCT" and the second row is "ANOTHER PRODUCT".

Using Azure.AI.FormRecognizer NuGet Package I get the string "MY FIRST PRODUCT ANOTHER PRODUCT", and I cannot tell the names of the products.

How this is solved? Is there a way to put a delimiter in the training?

Thanks!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,665 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 45,731 Reputation points Microsoft Employee
    2020-11-30T08:39:00.543+00:00

    @Danicode Form recognizer resource enables you to train your own custom model with the document format you have so that the analyze API can use the model that is trained with your custom model to detect a similar document and provide a similar response with extracted fields. Since you are using the nuget package the form recognizer API version the client is using should be 2.0 with probably a pre-built model

    If you have a document or set of documents with a particular format of text in them i.e tables, headings, text you can train the custom model to recognize text that is required to be extracted using the labeling tool. This enables you to tag the text from the tool and label then so the corresponding response from the trained model will display the text as separate fields which can be used for processing. If your document has tables with proper formatting the table is also detected and all the fields in the table are also part of the response which makes it simpler to use the text in your application. For example, this sample document provides a response in the form of JSON where the text in the first row of the first column is displayed along with its co-ordinates.

    43653-image.png

    Response:

    "text":"2.125% Notes due 2021","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.6321,5.3347,1.0245,5.3347,1.0245,5.4241,0.6321,5.4241],"text":"2.125%","confidence":1},{"boundingBox":[1.0711,5.3361,1.385,5.3361,1.385,5.4221,1.0711,5.4221],"text":"Notes","confidence":1},{"boundingBox":[1.4281,5.3361,1.6296,5.3361,1.6296,5.4221,1.4281,5.4221],"text":"due","confidence":1},{"boundingBox":[1.6695,5.3358,1.9098,5.3358,1.9098,5.4221,1.6695,5.4221],"text":"2021","confidence":1}]}  
    

    Please try to use the labeling tool to create a custom model and use it with analyze API which should enable you to tag your text or detect the table entirely and use the extracted fields as is.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.