We are experiencing an issue where the data for multiple customers is getting mixed when fetching the summary for a single customer using GPT-4

Amulya Prasad 0 Reputation points
2024-06-27T10:18:20.2233333+00:00

We are using Blob storage and AI search services to index the documents of multiple customers and finally integrating it with GPT-4 to generate the summary of the customers. However, when we are querying a single customer, it is mixing the data from others customers and adding it in the summary. The customer has a unique ID with customer number in each document and the query is done using the unique ID.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,528 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,567 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 47,991 Reputation points
    2024-06-30T00:30:21.5266667+00:00

    Hello @Amulya Prasad

    Thanks for reaching out to us, it sounds like you're encountering a data isolation issue where queries for a single customer are somehow retrieving data from multiple customers instead of just the intended customer. This can be particularly challenging when integrating Blob storage, AI search services, and GPT-4 for customer-specific summaries.

    I will suggest you checking on every part separately to make sure which part caused this issue, please perform tests to validate query isolation:

    Manual Testing: Use the AI search service's query interface or tools to manually test queries for different customer IDs. Verify that the results returned correspond only to the documents associated with the queried customer.

    • Automated Testing: Implement automated tests in your application or development environment that specifically target query isolation. These tests should simulate various scenarios where queries are made for different customer IDs and validate the correctness of the returned data.

    If you still can not find the root cause, please verify data segregation, ensure that data segregation is properly implemented across your Blob storage, AI search indexes, and any other data sources you're using:

    Unique IDs: Double-check that each document or record in your Blob storage and AI search index has a unique identifier associated with the specific customer it belongs to. This identifier should ideally be a customer ID or a unique key that unequivocally identifies the customer.

    Query Implementation: Review how you're querying the data. When querying for a specific customer's summary, ensure that your query filters explicitly by the customer's unique ID. This should prevent any cross-customer data retrieval.

    Check indexing and search configuration, please review the configuration of your AI search service:

    • Index Definition: Verify that the index used by your AI search service is correctly configured to include and respect the customer ID or unique key as a filter or facet. This ensures that searches are scoped to the specific customer's data.
    • Search Queries: Inspect the queries you're sending to the AI search service. Ensure that they include the customer's unique ID as a filter criterion and that there are no unintended wildcard or broad queries that could retrieve data from multiple customers.

    Examine the process of ingesting data into Blob storage and indexing it into your AI search service:

    • Data Sources: Ensure that data ingestion processes correctly tag or annotate each document with the appropriate customer ID or key during ingestion.
    • Data Pipelines: Review the entire data pipeline from ingestion to indexing to querying. Look for any steps where customer IDs might be misinterpreted or overlooked.

    Lastly, review how GPT-4 integrates with the AI search service and Blob storage:

    • Input to GPT-4: Confirm that the data fed into GPT-4 for generating summaries is based on the filtered and isolated results retrieved from the AI search service. Ensure that there are no mixing or conflating of data from different customers at this stage.

    If you go through all the steps but still have no idea about the root cause, please let us know.

    I hope this helps.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.