Azure monitor timeout errors

Heorhii Teriaiev 65 Reputation points
2023-03-14T10:01:23.9666667+00:00

Hi there!

We run an application on AKS. Collecting k8s events to log analytics. We use Azure Managed Grafana with Azure Monitor as a source for alerting. Everyday we're getting DatasourceError alerts due to timeouts between Grafana and Azure monitor. Errors we're getting:

  • Error = failed to execute query A: request failed, status: 503 Service Unavailable
  • Error = failed to execute query A: request failed, status: 529
  • Error = failed to execute query A: Get ... Error = failed to execute query A: Get

I don't think we cause too much load. We have about 20 alerts in total. Intervals between evaluations are 5-15 minutes. Is this expected that Azure Monitor returns timeouts/is unavailable everyday? Is there anything we can tune to make it more reliable? Thank you in advance

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
2,956 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,950 questions
Azure Managed Grafana
Azure Managed Grafana
An Azure service used to deploy Grafana dashboards for analytics and monitoring solutions.
88 questions
{count} votes

2 answers

Sort by: Most helpful
  1. JimmySalian-2011 42,056 Reputation points
    2023-03-14T10:13:14.3733333+00:00

    Hi,

    Seems like a config issue or timeout between the Grafana and Azure monitor, did you raise this query with Grafana? If not I will suggest you to put this same query with Grafana. However you can check the timeout settings for Azure Monitor APIs - https://video2.skills-academy.com/en-us/azure/azure-monitor/logs/api/timeouts

    Hope this helps

    JS

    ===

    Please Accept the answer if the information helped you. This will help us and others in the community as well.


  2. Bluematador 0 Reputation points
    2023-06-02T20:31:10.25+00:00

    Hi team,
    We are having the same issue over here. The request interval is 5 minutes, and we don't execute more than 20 requests.

    We are using Java SDK

    "com.azure.resourcemanager" % "azure-resourcemanager" % "2.26.0"
    

    Here is the log:

    com.azure.core.management.exception.ManagementException: Status code 529, "{"cost":0,"timespan":"2023-06-02T13:47:00Z/2023-06-02T13:57:00Z","interval":"PT1M","value":[{"id":"/subscriptions/cb2e7cfe-f7f5-402a-87e6-4305c8fa48d3/resourceGroups/site-recovery-vault-rg-1/providers/Microsoft.Storage/storageAccounts/v5edrisiterecovasrcache/fileServices/default/providers/Microsoft.Insights/metrics/SuccessE2ELatency","type":"Microsoft.Insights/metrics","name":{"value":"SuccessE2ELatency","localizedValue":"Success E2E Latency"},"displayDescription":"The average end-to-end latency of successful requests made to a storage service or the specified API operation, in milliseconds. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.","unit":"MilliSeconds","timeseries":[],"errorMessage":"Query was throttled with reason: ServerBusy. Requested Metric:XStoreShoeboxwestus3|XStore|SuccessE2ELatency. Output Dimensions: ResourceId. Dimension Filters: AccountResourceId,ApiName,Authentication,Container,Environment,GeoType,microsoft.resourceGroupName,microsoft.resourceId,microsoft.resourceType,microsoft.subscriptionId,ObjectType,Region,ResourceId,Tenant. FirstOutputSamplingType: NullableAverage. Start time: 6/2/2023 1:47:00 PM End time: 6/2/2023 1:56:00 PM. Resolution: 00:01:00, Last Value Mode: False. Context:MetricsMP, CustomerId:Unknown. TraceID:{0ca441b8-42cb-4522-a89d-1ca77597f2d0};AzSubId=cb2e7cfe-f7f5-402a-87e6-4305c8fa48d3;AzResType=Microsoft.Storage/storageAccounts/fileServices;AzRegion=westus3;, IsUserQuery:1, ClientTimeout:00:01:40, ClientName:GatewayService-API->QueryServiceCoordinator, ElapsedTime(ms):0, Exception:Microsoft.Online.QueryService.Contracts.QueryThrottledException: Query service is overloaded and is not accepting new queries at this time: too many queries in flight.\r\n   at Microsoft.Online.Metrics.QueryServiceCoordinator.Throttling.QueryCapacityThrottlingManager.CheckCapacity(String clientName, Boolean throttlingBypassMode, Boolean isPercentileGroupByQuery, Boolean isDistinctCountGroupByQuery) in D:\\dbs\\sh\\acem\\0526_171745\\cmd\\4\\src\\QueryService\\QueryServiceCoordinator\\Throttling\\QueryCapacityThrottlingManager.cs:line 106\r\n   at Microsoft.Online.Metrics.QueryServiceCoordinator.QueryCoordinator.<ExecuteAsync>d__51.MoveNext() in D:\\dbs\\sh\\acem\\0526_171745\\cmd\\4\\src\\QueryService\\QueryServiceCoordinator\\QueryCoordinator.cs:line 412.","errorCode":"Throttled"}],"namespace":"Microsoft.Storage/storageAccounts/fileServices","resourceregion":"westus3"}"
    

    Thank you!!