Form Recognizer API V2.1 how can I split fields?

Question

Hi, I have an invoice that has the products name column and on the result I get a long list of words, and I cannot turn them back to each row.

An example: I have 2 rows, the first product is "MY FIRST PRODUCT" and the second row is "ANOTHER PRODUCT".

Using Azure.AI.FormRecognizer NuGet Package I get the string "MY FIRST PRODUCT ANOTHER PRODUCT", and I cannot tell the names of the products.

How this is solved? Is there a way to put a delimiter in the training?

Thanks!

Answer

@Danicode Form recognizer resource enables you to train your own custom model with the document format you have so that the analyze API can use the model that is trained with your custom model to detect a similar document and provide a similar response with extracted fields. Since you are using the nuget package the form recognizer API version the client is using should be 2.0 with probably a pre-built model

If you have a document or set of documents with a particular format of text in them i.e tables, headings, text you can train the custom model to recognize text that is required to be extracted using the labeling tool. This enables you to tag the text from the tool and label then so the corresponding response from the trained model will display the text as separate fields which can be used for processing. If your document has tables with proper formatting the table is also detected and all the fields in the table are also part of the response which makes it simpler to use the text in your application. For example, this sample document provides a response in the form of JSON where the text in the first row of the first column is displayed along with its co-ordinates.

Response:

"text":"2.125% Notes due 2021","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.6321,5.3347,1.0245,5.3347,1.0245,5.4241,0.6321,5.4241],"text":"2.125%","confidence":1},{"boundingBox":[1.0711,5.3361,1.385,5.3361,1.385,5.4221,1.0711,5.4221],"text":"Notes","confidence":1},{"boundingBox":[1.4281,5.3361,1.6296,5.3361,1.6296,5.4221,1.4281,5.4221],"text":"due","confidence":1},{"boundingBox":[1.6695,5.3358,1.9098,5.3358,1.9098,5.4221,1.6695,5.4221],"text":"2021","confidence":1}]}

Please try to use the labeling tool to create a custom model and use it with analyze API which should enable you to tag your text or detect the table entirely and use the extracted fields as is.

Share via

Form Recognizer API V2.1 how can I split fields?

1 answer

Your answer