How is AML's average GpuUtilization metric computed?

Will Boyd 6 Reputation points
2020-08-26T14:04:09.953+00:00

How is the "GpuUtilization" metric computed for an AML workspace? What are the inputs and what is the equation used to compute GpuUtilization?

The "metrics" tab in the AML web portal shows a chart of the GpuUtilization over a specified time period, along with the average GpuUtilization for that time period. However, I have found that average GpuUtilization does not appear to accurately reflect the data shown in the chart for some of my organization's AML workspaces.

For example, the following screenshot shows the GpuUtilization for July 1-31, with the average GpuUtilization reported as 54.06. This is clearly much higher than what is shown in the chart. When I download the data from the chart (Share -> Download to Excel), I compute the average GpuUtilization to be ~11% in Excel. Why is there such a discrepancy?

20553-aml-metric-qna.png

I have found similar discrepancies for other AML workspaces as well. However, the average GpuUtilization appears to be more accurate for the August 1-25 time period than it is for July 1-31. I wish to better understand how AML computes the average GpuUtilization over a time period so we can accurately account for my organization's AML GPU usage on a per-workspace basis.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,848 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. GiftA-MSFT 11,166 Reputation points
    2020-08-27T14:49:02.87+00:00

    Hi, thanks for reaching out. GpuUtilization shows how much percentage of GPU was utilized for a given node during a run/job. One node can have one or more GPUs. This metric is published per GPU per node. You can apply filters based on node to understand the computation better. Let me know if that helps or if you need further assistance. Thanks.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.