Azure OpenAI Chat RAG with Own Tabular Data not working at all
I'm building an Azure AI Search + OpenAI chatbot to "talk/speak" with my own data. I'm in the sports field, and we need to create a chatbot capable of retrieving actual data from Excel files that we have and performing basic summary, sorting, and filtering tasks. Our data is mainly historical data on athletes. This is a quick look into what we have:
Player | Scoring | Games-In | Red-Flags | Net-Worth | League |
---|---|---|---|---|---|
John Doe | 87 | 150 | 2 | 12,500,000 | Alpha League |
Jane Smith | 92 | 200 | 1 | 25,000,000 | Beta League |
Max Powers | 75 | 180 | 3 | 10,000,000 | Gamma League |
Lucy Heart | 68 | 210 | 4 | 15,000,000 | Delta League |
Jack Hunter | 80 | 170 | 2 | 8,000,000 | Epsilon League |
Emma Stone | 95 | 220 | 1 | 30,000,000 | Zeta League |
Liam Knight | 78 | 160 | 2 | 11,500,000 | Eta League |
Ava Strong | 85 | 190 | 3 | 14,000,000 | Theta League |
Noah Swift | 90 | 230 | 1 | 20,000,000 | Iota League |
Mia Quick | 83 | 175 | 2 | 9,000,000 | Kappa League |
I have successfully created an index and semantic ranker with AI search, dividing our data into chunks to make it easier and more manageable for the search algorithm. We have also decided to include a file with the definitions of each feature. For instance, 'Net-Worth' is the value in US dollars of the athlete for business and financial purposes.
But when I ask my chatbot basic queries or questions like:
"What is the scoring of Jane Smith?" instead of retrieving 92, it says, "Sorry, I don't have sufficient data to answer that question."
When I ask it, "Who are the top 3 athletes with the highest net worth?" instead of giving a list sorted by player and their net worth, it just retrieves, "Sorry, I don't have sufficient data to answer that question."
Mind you, my data is extensive, with almost 15,000 entries which I have divided into very small chunks of information, keeping the headers to not lose context.
For those who have worked with tabular data with lots of quantitative and qualitative information in chatbots, what do you suggest is the best approach to solve these kinds of issues?
I'm currently dividing the Excel files into smaller Excel files, which are in XML format. Is it better to divide them into chunks of JSON, CSV, or .txt files? I would greatly appreciate any help from those of you who have worked with this type of data. Thank you