The purpose of this maturity model is to help clarify the Machine Learning Operations (MLOps) principles and practices. The maturity model shows the continuous improvement in the creation and operation of a production level machine learning application environment. You can use it as a metric for establishing the progressive requirements needed to measure the maturity of a machine learning production environment and its associated processes.
Maturity model
The MLOps maturity model helps clarify the Development Operations (DevOps) principles and practices necessary to run a successful MLOps environment. It's intended to identify gaps in an existing organization's attempt to implement such an environment. It's also a way to show you how to grow your MLOps capability in increments rather than overwhelm you with the requirements of a fully mature environment. Use it as a guide to:
Estimate the scope of the work for new engagements.
Establish realistic success criteria.
Identify deliverables you'll hand over at the conclusion of the engagement.
As with most maturity models, the MLOps maturity model qualitatively assesses people/culture, processes/structures, and objects/technology. As the maturity level increases, the probability increases that incidents or errors will lead to improvements in the quality of the development and production processes.
The MLOps maturity model encompasses five levels of technical capability:
Production systems are providing information on how to improve and, in some cases, automatically improve with new models
Approaching a zero-downtime system
Automated model training and testing
Verbose, centralized metrics from deployed model
The tables that follow identify the detailed characteristics for that level of process maturity. The model will continue to evolve.
Level 0: No MLOps
People
Model Creation
Model Release
Application Integration
Data scientists: siloed, not in regular communications with the larger team
Data engineers (if exists): siloed, not in regular communications with the larger team
Software engineers: siloed, receive model remotely from the other team members
Data gathered manually
Compute is likely not managed
Experiments aren't predictably tracked
End result might be a single model file manually handed off with inputs/outputs
Manual process
Scoring script might be manually created well after experiments, not version controlled
Release handled by data scientist or data engineer alone
Heavily reliant on data scientist expertise to implement
Manual releases each time
Level 1: DevOps no MLOps
People
Model Creation
Model Release
Application Integration
Data scientists: siloed, not in regular communications with the larger team
Data engineers (if exists): siloed, not in regular communication with the larger team
Software engineers: siloed, receive model remotely from the other team members
Data pipeline gathers data automatically
Compute is or isn't managed
Experiments aren't predictably tracked
End result might be a single model file manually handed off with inputs/outputs
Manual process
Scoring script might be manually created well after experiments, likely version controlled
Is handed off to software engineers
Basic integration tests exist for the model
Heavily reliant on data scientist expertise to implement model
Releases automated
Application code has unit tests
Level 2: Automated Training
People
Model Creation
Model Release
Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs
Data engineers: Working with data scientists
Software engineers: siloed, receive model remotely from the other team members
Data pipeline gathers data automatically
Compute managed
Experiment results tracked
Both training code and resulting models are version controlled
Manual release
Scoring script is version controlled with tests
Release managed by Software engineering team
Basic integration tests exist for the model
Heavily reliant on data scientist expertise to implement model
Application code has unit tests
Level 3: Automated Model Deployment
People
Model Creation
Model Release
Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs
Data engineers: Working with data scientists and software engineers to manage inputs/outputs
Software engineers: Working with data engineers to automate model integration into application code
Data pipeline gathers data automatically
Compute managed
Experiment results tracked
Both training code and resulting models are version controlled
Automatic release
Scoring script is version controlled with tests
Release managed by continuous delivery (CI/CD) pipeline
Unit and integration tests for each model release
Less reliant on data scientist expertise to implement model
Application code has unit/integration tests
Level 4: Full MLOps Automated Retraining
People
Model Creation
Model Release
Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs. Working with software engineers to identify markers for data engineers
Data engineers: Working with data scientists and software engineers to manage inputs/outputs
Software engineers: Working with data engineers to automate model integration into application code. Implementing post-deployment metrics gathering
Data pipeline gathers data automatically
Retraining triggered automatically based on production metrics
Compute managed
Experiment results tracked
Both training code and resulting models are version controlled
Automatic Release
Scoring Script is version controlled with tests
Release managed by continuous integration and CI/CD pipeline
Unit and Integration tests for each model release
Less reliant on data scientist expertise to implement model