Are Azure OpenAI Service Rate Limits Shared Between Deployments?
I have two Azure OpenAI services, one called dev and the other prod, both using the GPT-4 model deployed on UKS with different versions.
The prod service has a TPM of 35k and the dev service has a TPM of 10k.
While testing, I encountered rate limit issues on the dev service, prompting me to create a simple for loop that calls the model, asks a basic question, and waits one second between requests. The token count for these requests is negligible. Despite this, I am hitting a rate limit error after the 10th request, even though the RPM limit is set to 60.
Is it possible that the two deployments are interfering with each other in terms of rate limits?