Issue with Running Copied Jupyter Notebooks in Different Directories on Azure Machine Learning

Question

Hi everyone,

I've faced the an issue when creating a job using azure.ai.ml.command

from azure.ai.ml import command
from azure.ai.ml import Input

registered_model_name = "credit_defaults_model"

job = command(
    inputs=dict(
        data=Input(
            type="uri_file",
            path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default_of_credit_card_clients.csv",
        ),
        test_train_ratio=0.2,
        learning_rate=0.25,
        registered_model_name=registered_model_name,
    ),
    code="./src/",  # location of source code
    command="python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}",
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    display_name="credit_default_prediction",
)

returned_job = ml_client.create_or_update(job)

(
Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimentalfor more information. Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please seehttps://aka.ms/azuremlexperimentalfor more information. Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please seehttps://aka.ms/azuremlexperimentalfor more information. Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please seehttps://aka.ms/azuremlexperimentalfor more information. Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please seehttps://aka.ms/azuremlexperimentalfor more information. Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please seehttps://aka.ms/azuremlexperimentalfor more information. --------------------------------------------------------------------------- HttpResponseError Traceback (most recent call last) Cell In[6], line 1 ----> 1 ml_client.create_or_update(job) File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_ml_client.py:1221, in MLClient.create_or_update(self, entity, **kwargs)1205 defcreate_or_update(1206self,1207entity: T,1208**kwargs,1209) -> T:1210 """Creates or updates an Azure ML resource. 1211 1212 :param entity: The resource to create or update.(...)1218 , ~azure.ai.ml.entities.Environment, ~azure.ai.ml.entities.Component, ~azure.ai.ml.entities.Datastore] 1219 """-> 1221return_create_or_update(entity, self._operation_container.all_operations, **kwargs) File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/functools.py:889, in singledispatch..wrapper(*args, **kw)885 if notargs:886 raise TypeError(f'{funcname}requires at least '887'1 positional argument') --> 889returndispatch(args[0].__class__)(*args, **kw) File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_ml_client.py:1280, in _(entity, operations, **kwargs)1277@_create_or_update.register(Job)1278 def_(entity: Job, operations, **kwargs):1279module_logger.debug("Creating or updating job") -> 1280returnoperations[AzureMLResourceType.JOB].create_or_update(entity, **kwargs) File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace..decorator..wrapper_use_tracer(*args, **kwargs)92span_impl_type = settings.tracing_implementation()93 ifspan_impl_typeis None: ---> 94 returnfunc(*args, **kwargs)96 # Merge span is parameter is set, but only if no explicit parent are passed 97 ifmerge_spanand notpassed_in_parent: File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_telemetry/activity.py:372, in monitor_with_telemetry_mixin..monitor..wrapper(*args, **kwargs)370dimensions = {**parameter_dimensions, **(custom_dimensionsor{})}371 withlog_activity(logger, activity_nameorf.__name__, activity_type, dimensions)asactivityLogger: --> 372 return_value = f(*args, **kwargs)373 if notparameter_dimensions:374 # collect from return if no dimensions from parameter 375activityLogger.activity_info.update(_collect_from_return_value(return_value)) File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/operations/_job_operations.py:685, in JobOperations.create_or_update(self, job, description, compute, tags, experiment_name, skip_validation, **kwargs)679 if(rest_job_resource.properties.job_type == RestJobType.PIPELINE)or(680hasattr(rest_job_resource.properties, "identity")681 and(isinstance(rest_job_resource.properties.identity, UserIdentity))682):683self._set_headers_with_user_aml_token(kwargs) --> 685 result = self._create_or_update_with_different_version_api(rest_job_resource=rest_job_resource, **kwargs)687 ifis_local_run(result):688ws_base_url = self._all_operations.all_operations[689AzureMLResourceType.WORKSPACE690]._operation._client._base_url File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/operations/_job_operations.py:733, in JobOperations._create_or_update_with_different_version_api(self, rest_job_resource, **kwargs)730 ifrest_job_resource.properties.job_type == RestJobType.SWEEP:731service_client_operation = self.service_client_01_2024_preview.jobs --> 733 result = service_client_operation.create_or_update(734id=rest_job_resource.name,735resource_group_name=self._operation_scope.resource_group_name,736workspace_name=self._workspace_name,737body=rest_job_resource,738**kwargs,739)741 returnresult File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace..decorator..wrapper_use_tracer(*args, **kwargs)92span_impl_type = settings.tracing_implementation()93 ifspan_impl_typeis None: ---> 94 returnfunc(*args, **kwargs)96 # Merge span is parameter is set, but only if no explicit parent are passed 97 ifmerge_spanand notpassed_in_parent: File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_restclient/v2023_04_01_preview/operations/_jobs_operations.py:760, in JobsOperations.create_or_update(self, resource_group_name, workspace_name, id, body, **kwargs)758map_error(status_code=response.status_code, response=response, error_map=error_map)759error = self._deserialize.failsafe_deserialize(_models.ErrorResponse, pipeline_response) --> 760raiseHttpResponseError(response=response, model=error, error_format=ARMErrorFormat)762 ifresponse.status_code == 200:763 deserialized = self._deserialize('JobBase', pipeline_response) HttpResponseError: (UserError) Unable to create or update run kind_heart_d49sbldmm5 : Bad Request. Code: UserError Message: Unable to create or update run kind_heart_d49sbldmm5 : Bad Request. Additional Information:Type: ComponentName Info: { "value": "managementfrontend" }Type: Correlation Info: { "value": { "operation": "912f2b0b94fd2967b7d76132ea049e53", "request": "e76b613b83f44296" } }Type: Environment Info: { "value": "westeurope" }Type: Location Info: { "value": "westeurope" }Type: Time Info: { "value": "2024-10-06T12:40:19.5228664+00:00" }Type: InnerError Info: { "value": { "code": "BadArgument", "innerError": { "code": "CreatRunBadRequest", "innerError": null } } }

)

and followed the suggested instructions but haven't been able to resolve it. Here's what I've tried so far:

I changed different methods in the command job, modified the environment, and even switched the compute resources. However, the same error occurred.
I then copied and pasted the same tutorial (including the folder itself) into my notebook environment, but it still didn’t work.
Interestingly, when I ran the notebook directly without copying it, everything worked perfectly.

After some investigation, I realized the issue occurred when I copied the .ipynb file into the ./User/ directory—it didn't work there. However, when I copied the file into a subdirectory (./User/test), it worked as expected.

I'm aware of how to solve it now, but I don't understand why running the notebook in the main directory ./User/ causes it to fail. Could someone explain why this might be happening?

Any insights would be appreciated!

Answer

I think your problem might be related to the way AML environments handle paths, permissions, or directory-specific configurations within the workspace.

The main ./User/ directory might have specific permission or access restrictions that prevent proper execution of the job. AML environments can sometimes have stricter access controls or security policies for certain directories (like the root ./User/), especially if the environment is shared across multiple users or instances.

If your ./User/ directory contains other files, configurations, or environment variables that are conflicting with the execution of your job, it might lead to issues. In contrast, a subdirectory (like ./User/test/) is likely more isolated and less prone to such conflicts.

AML might treat certain directories in a specific way, possibly reserving them for specific purposes (e.g., ./User/ for user configurations, cache, or logs). When you copy a notebook to ./User/, it might not be resolving the paths or resources as expected. Running it in a subdirectory avoids those conflicts because it isn't subject to the same handling rules.

Also, when you copy a notebook into the main ./User/ directory, relative paths might behave differently or cached resources might be reused incorrectly, especially if you're using commands like code="./src/" which rely on path structures. In a subdirectory, the paths are more predictable and likely resolve correctly without interference from cached files or environment-specific settings.

I would recommend you to try running notebooks in isolated subdirectories and avoid placing them in the root directories, particularly those that might be shared or have environment-specific handling.

Let me know if you'd like more help troubleshooting this further!

Share via

Issue with Running Copied Jupyter Notebooks in Different Directories on Azure Machine Learning

1 answer

Your answer