OpenAI Studio Playground: Questions about CSV Dataset are not answered

Istvan Molnar 20 Reputation points
2024-05-25T06:43:12.0933333+00:00

Hi,

I have created a Blob Storage for my dataset:

Screenshot 2024-05-25 083437

https://en.wikipedia.org/wiki/George_Washington

https://www.kaggle.com/datasets/sootersaalu/amazon-top-50-bestselling-books-2009-2019

https://www.kaggle.com/datasets/neuromusic/avocado-prices

and connected to my deployment:

{
  "systemPrompt": "You are an AI assistant that helps people find information.",
  "fewShotExamples": [],
  "chatParameters": {
    "deploymentName": "gpt-4o", // same when using "gpt-4-32k"
    "maxResponseLength": 800,
    "temperature": 0.7,
    "topProbablities": 1,
    "stopSequences": null,
    "pastMessagesToInclude": 10,
    "frequencyPenalty": 0.42,
    "presencePenalty": 0.42
  }
}

The model in the playground can answer questions about George Washington, but has problem with books and avocados.

I have tried vector and keyword indexing, with different chunk sizes (256, 512, 1024) without any success.

Attempted prompts:

  • who is Elon Musk?
    • "The requested information is not available in the retrieved data. Please try another query or topic." ✅
  • who was George Washington?
    • "George Washington was a Founding Father and the first president of the United States..." ✅
  • which book was the most expensive on Amazon in 2012?
    • "The requested information is not available in the retrieved data. Please try another query or topic." ❌

What am I doing wrong?

Thank you

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,543 questions
Azure Startups
Azure Startups
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Startups: Companies that are in their initial stages of business and typically developing a business model and seeking financing.
236 questions
0 comments No comments
{count} votes

Accepted answer
  1. AshokPeddakotla-MSFT 30,066 Reputation points
    2024-05-27T02:42:29.9133333+00:00

    Istvan Molnar Greetings & Welcome to Microsoft Q&A forum!

    I understand that model is able to generate responses only on the George Washington and not with other datasets. This could be due to the un-supported file formats.

    Please note that, Azure OpenAI On Your Data supports the following file types:

    • .txt
    • .md
    • .html
    • .docx
    • .pptx
    • .pdf

    Please check **supported file types and formats **for more details.

    Also, there's an upload limit, and there are some caveats about document structure and how it might affect the quality of responses from the model:

    • If you're converting data from an unsupported format into a supported format, optimize the quality of the model response by ensuring the conversion:
      • Doesn't lead to significant data loss.
      • Doesn't add unexpected noise to your data.
    • If your files have special formatting, such as tables and columns, or bullet points, prepare your data with the data preparation script available on GitHub.
    • For documents and datasets with long text, you should use the available data preparation script. The script chunks data so that the model's responses are more accurate. This script also supports scanned PDF files and images.

    Do let me know if you have any further queries.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful