VM Performance Issue with graphic heavy workload

Thomas 1 Reputation point
2024-04-03T07:54:29.1666667+00:00

Hello!

We want to move with our CAD program (as recommended by the vendor) to the Azure Cloud. The solution should be multiuser capable and requires much graphic power for 3D CAD drawing.

Therefore we built the following environment (again as recommended by the vendor):

  • Host Pool with 1 Standard NV12s_v3 (12vcpus, 112 GiB memory) with an attached Premium SSD with 7500 IOPS (250 MBps).
  • SMB Fileshare with 1TB (Premium Tier) with 4500 IOPS (250 MiB/s)

All project data is stored on the fileshare, so every user can access it even if we scale up to more VMs.

Now moving to my problem:

Small projects are working flawlessly. As soon as i have big and complex projects the performance drops. If i want to rotate the 3D Model for example all actions are delayed- sometimes up to 3 seconds.

The CPU/ RAM and graphiccard utilization wasn´t ramping up in my tests -> Therefore it is most likely that the storage is the limiting factor.
I tried to store a test-project on the local attached SSD and the performance got even worth.
Next I scaled up the fileshare to 4,5 TB to get 7500 IOPS (550 MiB/s) - again no difference.

In the next step i tried to move data to the filestorage and back and i noticed something interesting.
When moving files from the VM to the filestorage everything is working as expected:

image When copying the same file back to the VM the speed is all OK but then dropps to 0 and slowly ramping up again.

User's image

I already checked azure monitor - but i got no usefull clue what the problem could be.

Do you have any ideas what the problems cause might be? or what i should try next to get a better performance?

Thank you in advance!

Thomas

Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,213 questions
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,479 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Silvia Wibowo 3,411 Reputation points Microsoft Employee
    2024-04-04T03:47:21.2133333+00:00

    Hi @Thomas , I understand that you're trying to get better disk performance for your VM.

    For VM type Standard_NV12s_v3, Max uncached disk throughput: 20k IOPS or 200 MB/s. By default, cache setting for data disk is ReadOnly, which means the cache is used for "Read" operation. It reflects in your chart above: when moving files from the VM to the filestorage, throughput can reach 387 MB/s -> this is cached performance, as your VM is reading from cache (temporary local disk, 320GB for NV12s_v3). When you move files to the VM, it requires "Write" operation, the VM writes directly to the disk. The uncached disk throughput applies - max is 20k IOPS or 200 MB/s. As your Premium SSD can handle up to 7500 IOPS or 250MB/s, the limiting factor is your VM. The "jagged" chart you're seeing shows that the VM is trying to catch up, it sends a signal to slow down to clear up the I/O queue, and after the queue is gone, it sends a signal to gradually ramp up again.

    Your options to improve performance:

    1. Configure host-cache for data disk as ReadWrite. However, this has a caveat: only if your application properly handles writing cached data to persistent disks when needed. Using ReadWrite cache with an application that doesn't handle persisting the required data can lead to data loss, if the VM crashes. This page has a screenshot of how to change host cache setting.
    2. Change your VM type to the one with higher I/O throughput - is it a requirement to use Nvidia Tesla M60, or is there any other GPU that you can use e.g. NCv2 or NCv3 series (Tesla P/V100), although pricing should be a consideration factor, too.

    General reference:

    0 comments No comments