Multimodal Vector Embeddings

Question

I was trying to create multimodal vector index for text and images by running this script:

https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-multimodal-build-demo.ipynb

For the actual blob storage container which is connected to our AI search resource, what is it supposed to contain (JSON file, image files themselves, etc.)?

Was asking because the code uses JSON_ARRAY as BlobIndexerParsingMode. When my blob contains the just the JSON file, the the indexer can read the contents of the JSON file (image url, caption), but those contents are not valid inputs to the VisionVectorizeSkill.

I have attached warnings in the indexer. mappingError

This is what the image embedding VisionVectorizeSkill shows in the skillset debugger. The input to the skill is the correct imageURL from the JSON file.

debug-skill-output

Thanks

Answer

Suchit Bhayani , Currently if you need to vectorize images and JSON files or arrays existing in the same container, you may use two different indexers pointing to the same index and have one with the JSON array processing with a model and one with the images processing with the other model.

Currently, an indexer doesn't have a way to specify what type of processing a file may have (vs the other if different models need to do different things).

-If you wish you may share your feedback on Uservoice - All of the feedback, you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure. Additionally, users with a similar request can up-vote your post and add their comments.

Having mentioned that, I have relayed this feedback internally to our product team. There is no ETA to share on this yet.

Just to highlight - Here is a quick general sample of how an indexer only including JSON files with jsonArray parsing mode and one excluding the JSON files would look like:

{
  "name": "json-indexer",
  "dataSourceName": "your-data-source",
  "targetIndexName": "your-index",
  "skillsetName": "json-skillset",
  "parameters": {
    "configuration": {
      "parsingMode": "jsonArray",
      "includedFileNameExtensions": ".json"
    }
  },
  "outputFieldMappings": [],
  "schedule": {
    "interval": "PT5M",
    "startTime": "2024-08-29T00:00:00Z"
  }
}

{
  "name": "image-indexer",
  "dataSourceName": "your-data-source",
  "targetIndexName": "your-index",
  "skillsetName": "image-skillset",
  "parameters": {
    "configuration": {
      "excludedFileNameExtensions": ".json"
    }
  },
  "outputFieldMappings": [],
  "schedule": {
    "interval": "PT5M",
    "startTime": "2024-08-29T00:00:00Z"
  }
}

For more info about supported types, inclusions and exclusions, please checkout the doc: Indexer overview - Azure AI Search | Microsoft Learn

If my answer helped (pointed, you in the right direction) > please click Accept Answer - it will benefit the community/users to find the answer quickly.

Share via

Multimodal Vector Embeddings

1 answer

Your answer