Artificial Intelligence - ML Operations

Overview

We define as Machine Learning Operations any operation in an environment where AI assets are treated jointly with all other software infrastructure assets in a continuous improvement context. MLOps essentially start at the very beginning of the AI pipeline during initial designs of the AI system and follow the lifecycle of the project up-to productionisation. MLOps involve concepts such as iterative development, continuous integration, continuous training, continuous deployment and improvement, versioning, reproducibility and monitoring of every step of the pipeline. By deploying proper MLOps best practices an ML Ops engineer contributes significantly to the decrease of technical debt accumulation during all the phases of the Artificial Intelligence pipeline.

Automation

The objective of the MLOps process is to fully automate the entire Artificial Intelligence pipeline from its data engineering steps all the way to deployment and productionisation of the AI models. Triggering systems in place can respond to various events that can emerge at any phase in the pipeline and effectively re-instantiate the pipeline all the way to redeployment of the AI models. Furthermore, a complete lineage of the entire process needs to be generated allowing the AI team to inspect and monitoring any change and its effects on the pipeline.

Getting to fully automated MLOps is usually an iterative process itself where parts of the pipeline are first automated through appropriate orchestration and as more and more of the process gets automated all the phases integrate to produce a fully robotic pipeline. Such a pipeline should be able to build, train, test, deploy all the steps of the AI process.

Lineage

Continuous experimentation in AI is a naturally embedded process where research-centric practices such as multiple simultaneous experiment settings can be deployed. A proper MLOps environment should be able to provide an MLOps engineer tools to support full lineage of the experiment process and the steps, parameters, hyperparameters, versions of datasets, model cores and environment that preceded the decision to productionise a pipeline. Reproducibility and versioning of the experiments is key and can lead to important velocity increase during productionisations.

Deliverables and components

Every stage of the MLOps pipeline has specific deliverables that are expected from the MLOps engineer, stages include:

  • Experimentation and development: Source code automations should be delivered throughout the entire AI pipeline.
  • Continuous Integration: Deployable pipeline components, packages, and services.
  • Continuous Delivery: Full deployment of the entire pipeline and new model services available in production.
  • Automated Triggering: Event based triggering of parts or all of the pipeline.
  • Continuous Monitoring: Data collection on AI systems live performance.

Implementing a full blown MLOps AI environment involves numerous components and pieces of software that the ML Ops engineer can use including:

  • Code versioning
  • Data artifacts
  • Model artifacts
  • CI/CD tools
  • Registries
  • Metadata tracking tools
  • Orchestrators
  • Monitoring tools
  • Micro-services and containers

There are numerous environments provided by many major vendors in the industry that deliver a complete MLOps environment for AI pipelines, including Google Cloud, Azure MLOps, Tensorflow MLOps and more. The tutorials included in this use case page will cover all environments as well as proprietary architectures that our clients can deploy per their needs.