Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book of MLOps covering these product updates and Generative AI requirements.
This blog post highlights key updates in the eBook, which can be downloaded here. We provide updates on governance, serving, and monitoring and discuss the accompanying design decisions to make. We reflect these updates in improved reference architectures. We also include a new section on LLMOps (MLOps for Large Language Models), where we discuss implications on MLOps, key components of LLM-powered applications, and LLM-specific reference architectures.
This blog post and eBook will be useful to ML Engineers, ML Architects, and other roles looking to understand the latest in MLOps and the impact of Generative AI on MLOps.
The rest of the blog post is structured as follows:
If you have not read the original Big Book of MLOps, this section gives a brief recap. The same motivations, guiding principles, semantics, and deployment patterns form the basis of our updated MLOps best practices.
We keep our definition of MLOps as a set of processes and automation to manage data, code and models to meet the two goals of stable performance and long-term efficiency in ML systems.
MLOps = DataOps + DevOps + ModelOps
In our experience working with customers like CareSource and Walgreens, implementing MLOps architectures accelerates the time to production for ML-powered applications, reduces the risk of poor performance and non-compliance, and reduces long-term maintenance burdens on Data Science and ML teams.
Our guiding principles remain the same:
The first principle, taking a data-centric approach, lies at the heart of the updates in the eBook. As you read below, you will see how our "Lakehouse AI" philosophy unifies data and AI at both the governance and model/pipeline layers.
We structure MLOps in terms of how ML assets—code, data, and models—are organized into stages from development, to staging, and to production. These stages correspond to steadily stricter access controls and stronger quality guarantees.
We discussed how code and/or models are deployed from development towards production, and the tradeoffs in deploying code, models, or both. We show architectures for deploying code, but our guidance remains largely the same for deploying models.
For more details on any of these topics, please refer to the original eBook.
In this section, we outline the key product features that improve our MLOps architecture. For each of these, we highlight the benefits they bring and their impact on our end-to-end MLOps workflow.
A data-centric AI platform must provide unified governance for both data and AI assets on top of the Lakehouse. Databricks Unity Catalog centralizes access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
Unity Catalog now includes MLflow Models and Feature Engineering. This unification allows simpler management of AI projects which include both data and AI assets. For ML teams, this means more efficient access and scalable processes, especially for lineage, discovery, and collaboration. For administrators, this means simpler governance at project or workflow level.
Within Unity Catalog, a given catalog contains schemas, which in turn may contain tables, volumes, functions, models, and other assets. Models can have multiple versions and can be tagged with aliases. In the eBook, we provide recommended organization schemes for AI projects at the catalog and schema level, but Unity Catalog has the flexibility to be tailored to any organization's existing practices.
Databricks Model Serving provides a production-ready, serverless solution to simplify real-time model deployment, behind APIs to power applications and websites. Model Serving reduces operational costs, streamlines the ML lifecycle, and makes it easier for Data Science teams to focus on the core task of integrating production-grade real-time ML into their solutions.
In the eBook, we discuss two key design decision areas:
We also discuss implementation details in Databricks, including:
Databricks Lakehouse Monitoring is a data-centric monitoring solution to ensure that both data and AI assets are of high quality and reliable. Built on top of Unity Catalog, it provides the unique ability to implement both data and model monitoring, while maintaining lineage between the data and AI assets of an MLOps solution. This unified and centralized approach to monitoring simplifies the process of diagnosing errors, detecting quality drift, and performing root cause analysis.
The eBook discusses implementation details in Databricks, including:
MLOps Stacks are updated infrastructure-as-code solutions which help to accelerate the creation of MLOps architectures. This repository provides a customizable stack for starting new ML projects on Databricks, instantiating pipelines for model training, model deployment, CI/CD, and others.
MLOps Stacks are built on top of Databricks asset bundles, which define infrastructure-as-code for data, analytics, and ML. Databricks asset bundles allow you to validate, deploy, and run Databricks workflows such as Databricks jobs and Delta Live Tables, and to manage ML assets such as MLflow models and experiments.
The updated eBook provides several reference architectures:
Below, we provide a multi-environment view. Much of the architecture remains the same, but it is now even easier to implement with the latest updates from Databricks.
The main architectural update is that both data and ML assets are managed as Lakehouse assets in the Unity Catalog. Note that the big improvements to Model Serving and Lakehouse Monitoring have not changed the architecture, but make it simpler to implement.
We end the updated eBook with a new section on LLMOps, or MLOps for Large Language Models (LLMs). We speak in terms of "LLMs," but many best practices translate to other Generative AI models as well. We first discuss major changes introduced by LLMs and then provide detailed best practices around key components of LLM-powered applications. The eBook also provides reference architectures for common Retrieval-Augmented Generation (RAG) applications.
The table below is an abbreviated version of the eBook table, which lists key properties of LLMs and their implications for MLOps platforms and practices.
Key properties of LLMs | Implications for MLOps |
---|---|
Implications for MLOps
|
Development process: Projects often develop incrementally, starting from existing, third-party or open source models and ending with custom models (fine-tuned or fully trained on curated data). |
Many LLMs take general queries and instructions as input. Those queries can contain carefully engineered "prompts" to elicit the desired responses. |
Development process: Prompt engineering is a new important part of developing many AI applications. Packaging ML artifacts: LLM "models" may be diverse, including API calls, prompt templates, chains, and more. |
Many LLMs can be given prompts with examples or context. |
Serving infrastructure: When augmenting LLM queries with context, it is valuable to use tools such as vector databases to search for relevant context. |
Proprietary and OSS models can be used via paid APIs. |
API governance: It is important to have a centralized system for API governance of rate limits, permissions, quota allocation, and cost attribution. |
LLMs are very large deep learning models, often ranging from gigabytes to hundreds of gigabytes. |
Serving infrastructure: GPUs and fast storage are often essential. Cost/performance trade-offs: Specialized techniques for reducing model size and computation have become more important. |
LLMs are hard to evaluate via traditional ML metrics since there is often no single "right" answer. |
Human feedback: This feedback should be incorporated directly into the MLOps process, including testing, monitoring, and capturing for use in future fine-tuning. |
The eBook includes a section for each topic below, with detailed explanations and links to resources.
This blog is merely an overview of the explanations, best practices, and architectural guidance in the full eBook. To learn more and to get started on updating your MLOps platform and practices, we recommend that you: