Scenario:
As part of the Enterprise Scale Migration (Azure Public Cloud to Azure Landing Zones), we are not able to access the compute in the Azure Machine Learning workspace.
We are getting the two issues mainly:
- When we open the compute tab in the AML Studio, it is displaying the error like "Error: User does not have access to this compute instance. Please check if this compute instance is assigned to you and you have access to the workspace. Additionally, verify that you are on the correct network to access this compute instance."
- When we try to access the compute using the supported applications such as
Jupyter
, Jupyter Lab
, VS Code (Web)
, etc. - We are unable to access them which shows the below error:
Unauthorized
User xxx@yyy.com does not have access to compute instance hariscompute1.
Please check if this compute instance is assigned to you and you have access to the workspace.
Additionally, verify that you are on the correct network to access this compute instance.
On troubleshooting with our network team, we found the issue that the traffic is not allowing from our compute IP to this below endpoint (canadacentral.tundra.azureml.ms
) in the UDP Port 5831:
Compute instance<region>.tundra.azureml.msUDP5831Compute instance<region>.tundra.azureml.ms
UDP5831We have whitelisted the "Azure ML Compute instance and compute cluster hosts section of endpoints and given ports" following the MS Doc provided, in our On-prem, Azure Firewalls.
On doing the nslookup for the above endpoint canadacentral.tundra.azureml.ms
, we got the public IP and that we whitelisted in our Azure Firewall. Then, we are able to access the Compute applications such as Jupyter
, Jupyter Lab
, etc.
But after an hour, the IP address of the endpoint is changed dynamically as we also found the relevant MS doc for getting the list of IP addresses allowed to Azure Machine Learning Service
Tag.
In the above MS Doc, it is mentioned that the list of IP addresses specified to the service tags can be updated weekly.
Hence, we need a mechanism or automation around validating the list of IPs updated to AzureMachineLearning
Service tag regularly would hep us to pro-actively whitelist the required IPs to not break the existing applications.