Databricks Mosaic AI provides unified tooling to build, deploy and monitor AI and ML solutions — from building predictive models to the latest GenAI and large language models (LLMs). Built on the Databricks Data Intelligence Platform, Mosaic AI enables organizations to securely and cost-effectively integrate their enterprise data into the AI lifecycle.
Complete Control
Maintain ownership over both the models and the data
Production Quality
Deliver accurate, safe and governed AI applications
Lower Cost
Train and serve your own custom LLMs at 10x lower cost
Start building your generative AI solution
There are four architectural patterns to consider when building a large language model–based solution, including prompt engineering, retrieval augmented generation (RAG), fine-tuning and pretraining. Databricks is the only provider that enables all four generative AI architectural patterns, ensuring you have the most options and can evolve as your business requirements change.
Complete ownership over your models and data
Mosaic AI is part of the Databricks Data Intelligence Platform, which unifies data, model training and production environments in a single solution. You can securely use your enterprise data to augment, fine-tune or build your own machine learning and generative AI models, powering them with a semantic understanding of your business without sending your data and IP outside your walls.
Deploy and govern all your AI models centrally
Model Serving is a unified service for deploying, governing and querying AI models. Our unified approach makes it easy to experiment with and productionize models. This includes:
- Custom ML models like PyFunc, scikit-learn and LangChain
- Foundation models (FMs) on Databricks like Llama 2, MPT, Mistral and BGE
- Foundation models hosted elsewhere like ChatGPT, Claude 2, Cohere and Stable Diffusion
Monitor data, features and AI models in one place
Lakehouse Monitoring provides a single, unified monitoring solution inside the Databricks Data Intelligence Platform. It monitors the statistical properties and quality of all tables with a single click. For applications powered by generative AI, it can scan outputs for toxic and unsafe content as well as diagnose errors.
Govern and track lineage across the full AI lifecycle — from data to models
Enforce proper permissions, set rate limits and track lineage to meet stringent security and governance requirements. All ML assets from data to models can be governed with a single tool, Unity Catalog, to help ensure consistent oversight and control at every stage of the ML lifecycle through development, deployment and maintenance.
Train and serve your own custom LLMs at 10x lower cost
With Mosaic AI, you can build your own custom large language model from scratch to ensure the foundational knowledge of the model is tailored to your specific domain. By training on your organization’s IP with your data, it creates a customized model that is uniquely differentiated. Databricks Mosaic AI Training is an optimized training solution that can build new multibillion-parameter LLMs in days with up to 10x lower training costs.
Collaborative Notebooks
Databricks Notebooks natively support Python, R, SQL and Scala so practitioners can work together with the languages and libraries of their choice to discover, visualize and share insights.
Runtime for Machine Learning
One-click access to preconfigured ML-optimized clusters, powered by a scalable and reliable distribution of the most popular ML frameworks (such as PyTorch, TensorFlow and scikit-learn), with built-in optimizations for unmatched performance at scale.
Feature Store
Facilitate the reuse of features with a data lineage–based feature search that leverages automatically logged data sources. Make features available for training and serving with simplified model deployment that doesn’t require changes to the client application.
AutoML
Empower everyone from ML experts to citizen data scientists with a “glass box” approach to AutoML that delivers not only the highest performing model, but also generates code for further refinement by experts.
Managed MLflow
Built on top of MLflow — the world’s leading open source platform for the ML lifecycle — Managed MLflow helps ML models quickly move from experimentation to production, with enterprise security, reliability and scale.
Production-Grade Model Serving
Serve models at any scale with one-click simplicity, with the option to leverage serverless compute.
Model Monitoring
Monitor model performance and how it affects business metrics in real time. Databricks delivers end-to-end visibility and lineage from models in production back to source data systems, helping analyze model and data quality across the full ML lifecycle, and pinpoint issues before they have a damaging impact.
Repos
Repos allows engineers to follow Git workflows in Databricks, enabling data teams to leverage automated CI/CD workflows and code portability.
Large Language Models
Databricks makes it simple to deploy, govern, query and monitor access to LLMs and integrate them into your workflows, and provides platform capabilities for augmenting (RAG) or fine-tuning LLMs using your own data, resulting in better domain performance. We also provide optimized tools to pretrain your own LLMs in days — at 10x lower cost.