Optimizing interpretation of image of a datatable

Jonas Bergman 0 Reputation points
2024-02-14T08:50:14.4933333+00:00

Hi I find the OpenAi Chat Playground with Vision enhancement very useful, but have some difficulties getting the perfect result when interpreting an image with a data table (contains time sheet for employees).

The good thing is that it is easy to give clear instructions on how to sort data and on what format I want AI to return the extracted data. My main problem is that the AI engine randomly fills in gaps in the time sheet with made-up data. I have tried to adjust Top P and Temperature respectively, and also changed the Frequency and Presence penalties. Maximizing the penalties and minimizing Temp or Top P gives decent results, but when the time sheet contains a lot of data it seems lika AI finds patterns and fills out gaps. I have instructed that filling gaps with made-up data is not allowed, but it still does that now and then. Attached a simple example of this. First attempt is correct, the second attempt I get a made-up record in return. Sometimes I get multiple false rows, sometimes I get timestamps that are wrong - time stamps that match patterns of previous workdays instead of reflecting the correct values. So ... how can I combine parameters and instructions to make sure that the only thing I get as output is what the uploaded data table image contains?

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
338 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,543 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Jonas Bergman 0 Reputation points
    2024-02-14T12:28:33.4733333+00:00

    Thanks. I tried that one now. First impression is that Vision Enhanced Chat actually was better at recognizing the content of the image, Document Intelligence had more difficulties separating the cells in each row. We will make a few more tests, and also try to be even more instructive when telling the Chat how to behave. Is there some kind of best practice on which settings to use to minimize the risk for made-up data when using Vision enhanced chat?

    0 comments No comments