Azure Batch Endpoint job ends with TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

Question

I'm trying to run an Azure ML Batch endpoint job, but the job always ends with an error because of the input (see below). I used a model created and trained in the Azure designer as described on the page: https://video2.skills-academy.com/en-us/azure/machine-learning/how-to-deploy-model-designer?view=azureml-api-1

Error from directory "logs/azureml/stderrlogs.txt" is like:

TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

My scoring script (auto-generated for model):

import os
import json
from typing import List

from azureml.studio.core.io.model_directory import ModelDirectory
from pathlib import Path
from azureml.studio.modules.ml.score.score_generic_module.score_generic_module import ScoreModelModule
from azureml.designer.serving.dagengine.converter import create_dfd_from_dict
from collections import defaultdict
from azureml.designer.serving.dagengine.utils import decode_nan
from azureml.studio.common.datatable.data_table import DataTable


model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'trained_model_outputs')
schema_file_path = Path(model_path) / '_schema.json'
with open(schema_file_path) as fp:
    schema_data = json.load(fp)


def init():
    global model
    model = ModelDirectory.load(model_path).model


def run(data):
    data = json.loads(data)
    input_entry = defaultdict(list)
    for row in data:
        for key, val in row.items():
            input_entry[key].append(decode_nan(val))

    data_frame_directory = create_dfd_from_dict(input_entry, schema_data)
    score_module = ScoreModelModule()
    result, = score_module.run(
        learner=model,
        test_data=DataTable.from_dfd(data_frame_directory),
        append_or_result_only=True)
    return json.dumps({"result": result.data_frame.values.tolist()})

definition of input:

input = Input(type=AssetTypes.URI_FILE, path="azureml://subscriptions/$$$$$$$$/resourcegroups/$$$$$$$$$/workspaces/$$$$$/datastores/workspaceblobstore/paths/UI/2023-08-24_193934_UTC/samples.json")

definition of job:

job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)

I've read/watched various tutorials/documentation and tried solutions from them, but nothing helped and I've been stuck with this error for several hours, so I'm asking for help.

Accepted Answer

Hello @jestemtym777

Welcome to Microsoft QnA!

based on the error message you provided, the function is being provided with a MiniBatch

The run function in your scoring script is expecting a JSON string

Please try :

Modify the Scoring Script to Accept MiniBatch: If the input is always going to be a MiniBatch object, you can modify the scoring script's run function to handle this type of input directly.
Modify the Input to the Endpoint: Ensure that the input you're providing to the endpoint matches what the run function in your scoring script expects.

I hope this helps!

Kindly mark the answer as Accepted and Upvote in case it helped!

Regards

Answer

The batch endpoint expects a json file but for some reason Azure adds a hidden file ".amlignore" to the URI_FOLDER where the minibatches were imported from which azure couldn't process and therefore threw errors - see below content of my input folder:

"minibatch": [".amlignore", "samples.json", "samples1.json"]

Share via

Azure Batch Endpoint job ends with TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

1 additional answer