In AutoML we use 5% minority class as threshold to classify imbalance/non-imbalance. This is a heuristic, and is one guideline produced in the Guardrails to the question “At x% threshold level is the dataset balanced?”. Since it is not possible to absolutely classify imbalance in all cases (depending on the dataset and its size and distribution, 5% or 10% or even higher may mean imbalance, whereas for very large datasets the minority class may have sufficient training samples for model to learn and get a reasonable imbalance-appropriate metric such as weighted AUC or balanced accuracy), current Guardrails serve the goal of surfacing “substantial” imbalance to user so the user can take any of the following measures:
• When the user knows (either from their knowledge of their own data or from guardrails) that there is imbalance, Automated ML provides an option in the Automated ML config to provide sample weights – a user-specified weight array where user can specify to weight each sample with a weight. That way they can weigh the minority class more when submitting the data into Automated ML config. We will soon provide weighting option for imbalance classes from within AutoML that will be activated automatically when imbalance is detected.