Cosmos Mongo (RU) aggregation pipeline multi stage query with limit is fetching all records in first match stage

Ankur Garg 0 Reputation points
2024-08-26T14:36:33.7433333+00:00

The aggregation pipeline query has following stages -

  1. Project (fields to return)
  2. Matching criteria
  3. Sort by field
  4. facet to limit the returned resultant documents.

The explain response's shardInformation is showing that total 30,838 documents were fetched. This is taking around 4373 millis for fetching single document.

Please help me with below queries:

[1] Why is all matching document fetched when the limit is specified in facet stage?

[2] Why is the streaming behavior not taking place as it is specified by Mongo API ?

[3] How can we avoid fetching all the documents and limit the document fetching as parameterized ?

Thanks in advance!

Mongo aggregation pipeline behavior thread - https://stackoverflow.com/a/54319812

Below is the complete aggregation pipeline query executed on Cosmos Mongo (RU) server with 10,000 RU/s throughput configured.

db.test_collection.explain('executionStats').aggregate([{"$project": {"accountId": 1, "priceCode": 1, "address": 1, "accountsReceivable": 1, "premises": 1, "name": 1, "phones": 1, "isSFA": 1, "siteId": 1, "location": 1, "_id": 1, "status": 1}},
{"$match": {"$and": [{"siteId": 1}, {"premises": "OFF"}]}},
{"$sort": {"name": 1}},
{"$facet": {"metadata": [{"$count": "totalCount"}], "records": [{"$skip": 0}, {"$limit": 1}]}}])
{
  command: "db.runCommand({explain: { 'aggregate' : 'test_collection', 'pipeline' : [{ '$project' : { 'accountId' : 1, 'priceCode' : 1, 'address' : 1, 'accountsReceivable' : 1, 'premises' : 1, 'name' : 1, 'phones' : 1, 'isSFA' : 1, 'siteId' : 1, 'location' : 1, '_id' : 1, 'status' : 1 } }, { '$match' : { '$and' : [{ 'siteId' : 1 }, { 'premises' : 'OFF' }] } }, { '$sort' : { 'name' : 1 } }, { '$facet' : { 'metadata' : [{ '$count' : 'totalCount' }], 'records' : [{ '$skip' : 0 }, { '$limit' : 1 }] } }], 'cursor' : { } }})",
  stages: [
    {
      stage: '$query',
      timeInclusiveMS: 3802.758,
      timeExclusiveMS: 3802.758,
      in: 30838,
      out: 30838,
      dependency: {
        getNextPageCount: 2,
        count: 2,
        time: 0,
        bytes: 19256935
      },
      details: {
        database: 'test_db',
        collection: 'test_collection',
        query: {
          '$and': [
            {
              siteId: {
                '$eq': 1
              }
            },
            {
              premises: {
                '$eq': 'OFF'
              }
            }
          ]
        },
        indexUsage: {
          pathsIndexed: {
            individualIndexes: [
              'premises',
              'siteId'
            ],
            compoundIndexes: []
          },
          pathsNotIndexed: {
            individualIndexes: [],
            compoundIndexes: []
          }
        },
        shardInformation: [
          {
            activityId: '70bc1b94-9dfc-4833-8d35-bff39971ad14',
            shardKeyRangeId: '[,FF) move next',
            durationMS: 931.4529,
            preemptions: 0,
            outputDocumentCount: 16260,
            retrievedDocumentCount: 16260
          },
          {
            activityId: 'b942eb32-eb0f-4615-be58-dd7bfbee93ef',
            shardKeyRangeId: '[,FF) move next',
            durationMS: 779.7555,
            preemptions: 0,
            outputDocumentCount: 14578,
            retrievedDocumentCount: 14578
          }
        ],
        queryMetrics: {
          retrievedDocumentCount: 30838,
          retrievedDocumentSizeBytes: 121117554,
          outputDocumentCount: 30838,
          outputDocumentSizeBytes: 19909571,
          indexHitRatio: 1,
          totalQueryExecutionTimeMS: 1599.34,
          queryPreparationTimes: {
            queryCompilationTimeMS: 0.26,
            logicalPlanBuildTimeMS: 0.13,
            physicalPlanBuildTimeMS: 0.31,
            queryOptimizationTimeMS: 0.02
          },
          indexLookupTimeMS: 0.89,
          documentLoadTimeMS: 338,
          vmExecutionTimeMS: 1503.05,
          runtimeExecutionTimes: {
            queryEngineExecutionTimeMS: 366.68,
            systemFunctionExecutionTimeMS: 306.69,
            userDefinedFunctionExecutionTimeMS: 0
          },
          documentWriteTimeMS: 797.48
        }
      }
    },
    {
      stage: '$projectInclude',
      timeInclusiveMS: 4120.5881,
      timeExclusiveMS: 317.8301,
      in: 30838,
      out: 30838,
      details: {
        set: {},
        include: [
          'accountId',
          'priceCode',
          'address',
          'accountsReceivable',
          'premises',
          'name',
          'phones',
          'isSFA',
          'siteId',
          'location',
          '_id',
          'status'
        ]
      }
    },
    {
      stage: '$sort',
      timeInclusiveMS: 4373.2708,
      timeExclusiveMS: 252.6827,
      in: 61676,
      out: 0,
      details: {
        sortKey: {
          name: 1
        }
      }
    },
    {
      stage: '$facet',
      timeInclusiveMS: 4373.3609,
      timeExclusiveMS: 0.0901,
      in: 30838,
      out: 1,
      details: {
        metadata: {
          stage: '$count',
          timeInclusiveMS: 4373.3185,
          timeExclusiveMS: 4373.3185,
          in: 30838,
          out: 1,
          details: {
            '$count': 'totalCount'
          }
        },
        records: {
          stage: '$limit',
          timeInclusiveMS: 4373.2649,
          timeExclusiveMS: 0.0088,
          in: 30838,
          out: 1,
          details: {
            limit: 1
          }
        }
      }
    }
  ],
  estimatedDelayFromRateLimitingInMilliseconds: 0,
  retriedDueToRateLimiting: false,
  totalRequestCharge: 2205.58,
  continuation: {
    hasMore: false
  },
  ActivityId: 'ebb2bf78-c5f3-4359-a225-9da3ddb8c1db',
  ok: 1
}
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,612 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.