Need help on table extraction from a document

Sriramsubramaniyan Nadarajan 76 Reputation points
2024-06-26T13:25:17.5+00:00

Hi All,

We are having tables in a document, we need to extract it. We are invoking form recognizer using REST API, and in the OCR response, we are getting an element named tables, with all cell details. Can you please let me know if the only way to extract data from tables is to parse the tables element cell by cell and read the data, or do we have any other method to read and extract the table directly?

  1. If in case the borders of the cells are not clearly visible, then we are sometimes seeing the segregation is missing in the extracted results, all the data is extracted as a single cell element. Please let me know if there are any ways to avoid this issue. Thanks
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,500 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 6,380 Reputation points Microsoft Vendor
    2024-06-26T14:27:23.4566667+00:00

    Hi @Sriramsubramaniyan Nadarajan,

    Thank for your query regarding the table extraction. To extract data from tables in the OCR response of Form Recognizer or Document Intelligence, you typically need to parse the tables element cell by cell. While Form Recognizer provides detailed cell information, there isn't a direct method to extract the entire table at once without parsing each cell. However, you can simplify this process by using prebuilt models or training custom models to handle specific structured content, which can help map headers and organize the data more efficiently.

    For more info see: Analyze document table extraction.

    To avoid issues with unclear cell borders causing data to be extracted as a single cell, ensure your tables have clearly defined borders and sufficient spacing. Preprocessing documents to enhance table lines or using higher resolution scans can help. Training a custom model with Form Recognizer tailored to your document layout can also improve table detection and data extraction accuracy.

    See the page: Input requirements

    For best practice to achieve higher accuracy, see: Ensure high model accuracy.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sriramsubramaniyan Nadarajan 76 Reputation points
    2024-06-26T16:50:48.5666667+00:00

    Hi @santoshkc ,

    Thanks a lot for sharing all the details. It is very helpful.

    Can you please share a sample code to extract the cells from the table, if possible. Thanks