Handling Search Queries in Azure AI Search

Aakash Bhaikatti 0 Reputation points
2024-06-10T08:46:23.2366667+00:00

Hello,

I'm working on a project where we've implemented a document processing RAG like pipeline using Azure AI Search service. Here's a brief overview of our setup:

  1. We extract text from a PDF and store it in a .txt file inside Azure Blob Storage.
  2. Using Azure Blob Storage, we upload the document to an Azure AI Search index.
  3. We use an OpenAI prompt to generate 20 product keywords based on the content of the file, which are then stored in the index.
  4. We are employing a simple hybrid search to retrieve relevant answers from the indexed document.

The system works well for most queries. However, we encounter issues when users submit queries without spaces, hyphens, respecting case sensitivity, special characters, or containing alphanumeric combinations. These types of queries are not returning the expected relevant results.

Our specific questions are:

  1. How can we improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations?
  2. Are there specific settings or configurations in Azure AI Search that can help in normalizing such queries before processing?
  3. What are the best practices for pre-processing or transforming user queries to enhance the search accuracy in this context?

Thank you for your assistance!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
831 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,538 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,577 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 6,581 Reputation points
    2024-06-10T13:57:46.2466667+00:00

    Hello Aakash Bhaikatti,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are working on a project and seeking assistance on how to handle Search Queries in Azure AI Search with those three questions.

    Solution

    I will provide the solution based on the scenario given and your questions.

    Q1:

    How can we improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations?

    To improve the search capability to handle queries with special characters, no spaces, hyphens, mixed cases, and alphanumeric combinations in Azure AI Search, you can employ a combination of custom analyzers, query preprocessing, synonym maps, scoring profiles, and continuous monitoring.

    Flexible filtering, faceting, and sorting in Azure Cognitive Search. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/flexible-filtering-faceting-and-sorting-in-azure-cognitive/ba-p/3038442.

    Q2:

    Are there specific settings or configurations in Azure AI Search that can help in normalizing such queries before processing?

    Yes, there are specific settings and configurations in Azure Cognitive Search that can help in normalizing queries before processing. These include custom analyzers, tokenizers, token filters, and synonym maps. Normalizers can achieve light text transformations that improve results such as consistent casing, normalize accents and diacritics to ASCII equivalent characters, and map characters like - and whitespace into a user-specified character.

    Partial terms, patterns, and special characters. https://video2.skills-academy.com/en-us/azure/search/search-query-partial-matching.

    Q3:

    What are the best practices for pre-processing or transforming user queries to enhance the search accuracy in this context?

    The best practices can be applied for pre-processing or transforming user queries start from Normalize Case, and I provide you here with combined Python function implementing several of these best practices in comments as a title:

    import re
    def preprocess_query(query):
        # Normalize case
        query = query.lower()
        # Remove special characters
        query = re.sub(r'[^a-z0-9\s-]', '', query)
        # Handle hyphens and spaces
        query = query.replace('-', ' ')
        # Split alphanumeric combinations
        query = re.sub(r'(\d)([a-zA-Z])', r'\1 \2', query)
        query = re.sub(r'([a-zA-Z])(\d)', r'\1 \2', query)
        # Remove extra whitespaces
        query = ' '.join(query.split())
        # Expand query with synonyms (example dictionary)
        synonyms = {
            "product": ["item", "good"],
            "prod-id": ["product-id"],
            "productid": ["product id"]
        }
        expanded_query = query
        for term, synonym_list in synonyms.items():
            if term in query:
                expanded_query += " " + " ".join(synonym_list)
        return expanded_query
    # Example usage
    user_query = "ProductID-123ABC"
    normalized_query = preprocess_query(user_query)
    print(normalized_query)  # Output: "productid 123 abc product item good product id"
    

    However, in this link Azure AI Search Database Selection: Optimizing Performance. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-database-selection-optimizing-performance-and/ba-p/4155601.

    You will be able to read more about best practices and steps to follow.

    References

    Source: Flexible filtering, faceting, and sorting in Azure Cognitive Search. Accessed, 6/10/2024.

    Source: Partial terms, patterns, and special characters. Accessed, 6/10/2024.

    Source: Azure AI Search Database Selection: Optimizing Performance. Accessed, 6/10/2024.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    0 comments No comments