Artificial Intelligence - ML Operations
We define as Machine Learning Operations any operation in an environment where AI assets are treated jointly with all other software infrastructure assets in a continuous improvement context. It is a set of methodologies that ensure efficient and reliable deployment and maintenance of AI solutions in production. MLOps essentially start at the very beginning of the AI pipeline during initial designs of the AI system and follow the lifecycle of the project up-to productionisation and monitoring. MLOps involve concepts such as iterative development, continuous integration, continuous training, continuous deployment and improvement, versioning as well as reproducibility and monitoring of every step of the pipeline. By using MLOps best practices, an ML Ops engineer significantly contributes to the decrease of technical debt accumulation during the Artificial Intelligence pipeline lifecycle.
The objective of MLOps is to fully automate the entire Artificial Intelligence pipeline, from its data engineering phase to deployment and productionisation of the AI models. Triggering mechanisms in place, can respond to various events that can materialise at any phase in the pipeline. The response can range from partial to full re-instantiation of the pipeline including redeployment of the AI models. Furthermore, a complete lineage of the entire process is generated, allowing the AI team to inspect and monitor any change and its effects on the performance of the system.
Getting to fully automated MLOps is usually an iterative process itself during which, parts of the pipeline are automated through appropriate orchestration. With time, more and more of the process gets automated and all the phases integrate to result in a fully robotic pipeline. Such a pipeline should be able to build, train, test, deploy and monitor all the steps of the AI process.
Lineage - Reproducibility
Continuous experimentation in AI is a naturally embedded process where research-centric practices, such as multiple simultaneous experiment settings being tested. A proper MLOps environment should be able to provide tools to support lineage tracking of the experiment process including the steps, parameters, hyperparameters, versions of datasets, model cores and environment that preceded the pipeline productionisation. Reproducibility and versioning of the experiments is key and can lead to important team velocity increase during productionisations.
Every stage of the MLOps pipeline has specific deliverables that are expected from the MLOps engineer, stages and deliverables include:
- MLOps design: This phase resides at the onset of the AI pipeline and involves the usual software engineering stages including business understanding, requirements, available resources and infrastructure as well as initial exploration of the availability of the required datasets.
- Experimentation and development: Source code automations on the entire AI pipeline including data ingestion and preparation as well modelling and evaluation of the AI components.
- Continuous Integration, Delivery and Monitoring: Deployable pipeline components, packaging, and inference services, triggering components recreating parts or the entire pipeline and metrics collection on AI components live behaviour and performance.
At present, the MLOps engineer has numerous software components available that can aid her in implementing a full blown MLOps AI environment. The list is continuously evolving, and solutions can be divided to proprietary architectures using components that enable code versioning, artifact and model storage in registries, orchestrators, and monitoring tools as well as micro-service containers and service implementation tools. Aside proprietary solutions, there are numerous frameworks provided by many major vendors in the industry that deliver complete MLOps environments for AI pipelines both in-premises, on the Cloud and hybrids including AWS Sagemaker, Azure MLOps, Google Cloud and Tensorflow MLOps amongst others. The Blueprints included in this use case page will cover these environments as well as proprietary architectures that our clients can deploy as per their needs.
Enhancing real-time news streams using AWS serverless AI. An automated MLOps architecture using Terraform.