Conditional Hyperparameters in Sweep Job

Tom Owen 0 Reputation points
2024-06-28T13:20:33.82+00:00

Hello,

I'm working on a hyperparameter tuning job in Azure ML using the CLI/YAML schema. I want to be able to add different optimizers as part of the hyperparameter search space, and also be able to tune their hyperparameters but I'm concerned that doing this will cause the sweep job to trial hyperparameters that aren't being used by the optimizer, therefore wasting compute time and resources.

Let's say I have the following function to do this:

from keras.optimizers import SGD, Adam, RMSprop, Adagrad, Adadelta

def get_optimizer(args):
    name = args.optimizer
    if name == 'SGD':
        return SGD(learning_rate=args.learning_rate, momentum=args.momentum)
    elif name == 'Adam':
        return Adam(learning_rate=args.learning_rate, beta_1=args.beta_1, beta_2=args.beta_2, epsilon=args.epsilon)
    elif name == 'RMSprop':
        return RMSprop(learning_rate=args.learning_rate, rho=args.rho, epsilon=args.epsilon)
    elif name == 'Adagrad':
        return Adagrad(learning_rate=args.learning_rate, epsilon=args.epsilon)
    elif name == 'Adadelta':
        return Adadelta(learning_rate=args.learning_rate, rho=args.rho, epsilon=args.epsilon)
    else:
        raise ValueError(f"Unknown optimizer: {name}")

The hyperparameter 'momentum' is only relevant for Stochastic Gradient Descent (SGD), therefore if I use a 'choice' input for a sweep job and it choses 'Adam' during the hyperparameter sweep, I don't want it to trial lots of different values for 'momentum', as this will waste a trial and potentially confuse a Bayesian optimization algorithm.

Ideally, it'd be good to be able to define the search space in the YAML schema for the sweep job as such:

search_space:
  optimizer:
    type: choice
    values: ['SGD', 'Adam', 'RMSprop', 'Adagrad', 'Adadelta']
  learning_rate:
    type: uniform
    min_value: 0.0001
    max_value: 0.01
  beta_1:
    type: uniform
    min_value: 0.85
    max_value: 0.99
    conditional:
      - parent: optimizer
        value: 'Adam'
  beta_2:
    type: uniform
    min_value: 0.9
    max_value: 0.999
    conditional:
      - parent: optimizer
        value: 'Adam'
  momentum:
    type: uniform
    min_value: 0.5
    max_value: 0.9
    conditional:
      - parent: optimizer
        value: 'SGD'

Does anyone know if there's a different way I can achieve what I'm after, or do I need to re-think the hyperparameter tuning strategy and use Hyperopt in Databricks or something like that?

Many thanks

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,682 questions
{count} votes