How we can index blobs based on last modified date of blob in Azure AI search

00276266 0 Reputation points
2024-05-31T16:11:57.21+00:00

We have around 20 million blobs in our blob container for the years 2021, 2022, 2023, and 2024. We only want to index blobs from 2023 and 2024, which will be approximately 5 million in number. Essentially, we want to include only the files from the last 12 months in our index.

I would like to know if there is a method or filter available, either at the time of creating the index or while running an indexer, so that it only indexes the latest blobs and ignores those older than 12 months.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
831 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,576 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 17,286 Reputation points
    2024-05-31T18:16:54.1466667+00:00

    Hi @00276266

    To index blobs in Azure AI Search based on their last modified date, you can use the metadata_storage_last_modified field, which is a timestamp indicating when the blob was last modified1. Azure AI Search uses this timestamp to identify changed blobs and avoid reindexing everything after the initial indexing1.

    Here are the steps you can follow:

    1. Check the blob or file’s LastModified timestamp to make sure it’s newer than the last indexer run
    2. If the blob’s LastModified timestamp is not within the last 12 months, you can force an update to the current date and time by resaving the existing metadata
    3. After updating the LastModified timestamp, run the indexer. This will index the blobs that have been modified within the last 12 months.

    Hope that helps.

    -Grace