Node.js App Downtime Due to Cosmos DB Connection Issues

Robin Weiß 0 Reputation points
2024-08-01T08:48:43.24+00:00

Hi,

We are currently running a Node.js application that uses Cosmos DB as its database. Unfortunately, we experience downtimes of our app every few days. We have tracked the issue and noticed very long connection times (30 seconds or more) to Cosmos DB just before the end-users report the downtime. We only perform read requests on the Cosmos DB.

Additional Details:

  • App Configuration: We are using the "@azure/cosmos": "^4.0.0" SDK to connect to Cosmos DB.
  • Cosmos DB Configuration: Our Cosmos DB is configured with 1000 RU/s (manual).
  • Error Messages: During the downtime, there are no error messages in our Node.js application logs. The Cosmos DB logs are also empty during the downtime.
  • Regional Information: Our Cosmos DB instance is hosted in Germany West Central, and our Node.js app is running in the same region.
  • Recent Changes: There have been no recent changes to our app, database, or Azure environment.

In the Cosmos DB metrics, we observe a continuously increasing graph for total requests during working hours until the downtime starts, after which it drops to zero or near zero. After about an hour, the app becomes available again and functions normally. The Normalized RU/s is not an issue; we are consistently below 10%.

Has anyone experienced similar issues or have any suggestions on how to diagnose and resolve this problem? Any insights into potential causes and solutions would be greatly appreciated.

Thank you!

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,612 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sai Raghunadh M 150 Reputation points Microsoft Vendor
    2024-08-28T05:35:24.53+00:00

    Hi @ Robin Weiß,

    Thanks for the question and using MS Q&A platform.

    • The issue is likely caused by too many requests to Cosmos DB during working hours.
    • This can cause throttling and lead to long connection times.
    • To fix the issue, you can either increase the provisioned throughput for your Cosmos DB instance or optimize your application to reduce the number of requests to Cosmos DB.
    • You can also enable diagnostics logs for your Cosmos DB instance to get more information on the requests and response times.
    • For more information you can refer this documentation: Diagnose and troubleshoot Azure Cosmos DB request rate too large (429) exceptions.
    • If you need further assistance, you can reach out to Azure Support. Hope this helps. Do let us know if you have any further queries. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.