Azure Batch Endpoint job ends with TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

jestemtym777 20 Reputation points
2023-08-26T11:51:12.48+00:00

I'm trying to run an Azure ML Batch endpoint job, but the job always ends with an error because of the input (see below). I used a model created and trained in the Azure designer as described on the page: https://video2.skills-academy.com/en-us/azure/machine-learning/how-to-deploy-model-designer?view=azureml-api-1

Error from directory "logs/azureml/stderrlogs.txt" is like:

TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

My scoring script (auto-generated for model):

import os
import json
from typing import List

from azureml.studio.core.io.model_directory import ModelDirectory
from pathlib import Path
from azureml.studio.modules.ml.score.score_generic_module.score_generic_module import ScoreModelModule
from azureml.designer.serving.dagengine.converter import create_dfd_from_dict
from collections import defaultdict
from azureml.designer.serving.dagengine.utils import decode_nan
from azureml.studio.common.datatable.data_table import DataTable


model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'trained_model_outputs')
schema_file_path = Path(model_path) / '_schema.json'
with open(schema_file_path) as fp:
    schema_data = json.load(fp)


def init():
    global model
    model = ModelDirectory.load(model_path).model


def run(data):
    data = json.loads(data)
    input_entry = defaultdict(list)
    for row in data:
        for key, val in row.items():
            input_entry[key].append(decode_nan(val))

    data_frame_directory = create_dfd_from_dict(input_entry, schema_data)
    score_module = ScoreModelModule()
    result, = score_module.run(
        learner=model,
        test_data=DataTable.from_dfd(data_frame_directory),
        append_or_result_only=True)
    return json.dumps({"result": result.data_frame.values.tolist()})

definition of input:

input = Input(type=AssetTypes.URI_FILE, path="azureml://subscriptions/$$$$$$$$/resourcegroups/$$$$$$$$$/workspaces/$$$$$/datastores/workspaceblobstore/paths/UI/2023-08-24_193934_UTC/samples.json")

definition of job:

job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)

I've read/watched various tutorials/documentation and tried solutions from them, but nothing helped and I've been stuck with this error for several hours, so I'm asking for help.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,687 questions
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
320 questions
0 comments No comments
{count} votes

Accepted answer
  1. Konstantinos Passadis 17,381 Reputation points MVP
    2023-08-26T12:03:42.9833333+00:00

    Hello @jestemtym777

    Welcome to Microsoft QnA!

    based on the error message you provided, the function is being provided with a MiniBatch

    The run function in your scoring script is expecting a JSON string

    Please try :

    1. Modify the Scoring Script to Accept MiniBatch: If the input is always going to be a MiniBatch object, you can modify the scoring script's run function to handle this type of input directly.
    2. Modify the Input to the Endpoint: Ensure that the input you're providing to the endpoint matches what the run function in your scoring script expects.

    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. jestemtym777 20 Reputation points
    2023-08-27T09:22:01.4566667+00:00

    The batch endpoint expects a json file but for some reason Azure adds a hidden file ".amlignore" to the URI_FOLDER where the minibatches were imported from which azure couldn't process and therefore threw errors - see below content of my input folder:

    "minibatch": [".amlignore", "samples.json", "samples1.json"]