No space left on device

Question

Hello,

I am very much of a beginner to Azure and cloud computing.

Therefore, I have two questions:

Currently, I am aiming to train whisper-AI on common voice dataset that is available on hugging face. The dataset is a datasetDict object. I have saved it and converted into json. So now the dataset is bunch of JSON files saved in a folder. I created a data asset for this URI folder and saved it in a datastore. However, I am having the hardest time to access and open the folder using the data asset path, is there any way to access the folder through the python SDK?

Another issue I am having is while I am preparing my data for training ( resampling and extracting features) I get the error of not enough storage ( I am doing this through python SDK jupyter notebook), how can I overcome this issue?

Please I would appreciate the assistance,

Thanks!

Layla

Answer

@Layla Bitar I think you can follow the guidance from this page to load your data asset and mount it before using it in your job. You should be able to achieve this using the below snippet. Ex:

from azure.ai.ml import command, Input, Output, MLClient
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.identity import DefaultAzureCredential

# Set your subscription, resource group and workspace name:
subscription_id = ""
resource_group = ""
workspace = ""

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

data_type = AssetTypes.URI_FOLDER
input_mode = InputOutputModes.RO_MOUNT 

input_path = "azureml://datastores/workspaceblobstore/paths/input-folder/" 
output_path = "azureml://datastores/workspaceblobstore/paths/output-folder/"

inputs = {
    "input_data": Input(type=data_type, path=input_path, mode=input_mode)
}

outputs = {     
     "output_data": Output(type=data_type, path=output_path, mode=output_mode)
}

job = command(
    command="cp ${{inputs.input_data}} ${{outputs.output_data}}",
    inputs=inputs,
    outputs=outputs,
    environment="azureml://registries/azureml/environments/sklearn-1.1/versions/4",
    compute="cpu-cluster",
)

# Submit the command
ml_client.jobs.create_or_update(job)

With respect to the space issue, I think you might be using input mode as DOWNLOAD instead of MOUNT. Try the MOUNT option and check if it goes through.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

No space left on device

1 answer

Your answer