by Patrick Wendell and Naveen Rao
Over the last year, we have seen a surge of commercial and open-source foundation models showing strong reasoning abilities on general knowledge tasks. While general models are an important building block, production AI applications often employ Compound AI Systems, which leverage multiple components such as tuned models, retrieval, tool use, and reasoning agents. These AI agent systems augment foundation models to drive much better quality and help customers confidently take these GenAI apps to production.
Today at the Data and AI Summit, we announced several new capabilities that make Databricks Mosaic AI the best platform for building production-quality AI agent systems. These features are based on our experience working with thousands of companies to put AI-powered applications into production. Today’s announcements include support for fine-tuning foundation models, an enterprise catalog for AI tools, a new SDK for building, deploying, and evaluating AI agents, and a unified AI gateway for governing deployed AI services.
With this announcement, Databricks has entirely integrated and substantially expanded the model-building capabilities first included in our MosaicML acquisition one year ago.
The evaluation of monolithic AI models to compound systems is an active area of both academic and industry research. Recent results have found that “state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.” These findings are reinforced by what we see in our customer base. Take for example financial research firm FactSet – when they deployed a commercial LLM for their Text-to-Financial-Formula use case, they could only get 55% accuracy in the generated formula, however, modularizing their model into a compound system allowed them to specialize each task and achieve 85% accuracy. Databricks Mosaic AI supports building AI systems through the following products:
Users only have to select a task and base model and provide training data (as a Delta table or a .jsonl file) to get a fully fine-tuned model that they own for their specialized task
Mosaic AI Model Serving now supports function-calling and users can quickly experiment with functions and base models in the AI Playground
General-purpose AI models optimize for benchmarks, such as MMLU, but deployed AI systems are instead designed to solve specific user tasks as part of a broader product (such as, answering a support ticket, generating a query, or suggesting a response). To make sure these systems work well, it’s important to have a robust evaluation framework for defining quality metrics, gathering quality signals, and iterating on performance. Today we’re excited to announce several new evaluation tools:
Mosaic AI Agent Evaluation provides AI-assisted metrics to help developers form quick intuitions
Mosaic AI Agent Evaluation allows stakeholders, even those outside the Databricks Platform, to assess model outputs and provide ratings to help iterate on quality
Corning is a materials science company - our glass and ceramics technologies are used in many industrial and scientific applications, so understanding and acting on our data is essential. We built an AI research assistant using Databricks Mosaic AI Agent Framework to index hundreds of thousands of documents including US patent office data. Having our LLM-powered assistant respond to questions with high accuracy was extremely important to us - that way, our researchers could find and further the tasks they were working on. To implement this, we used Databricks Mosaic AI Agent Framework to build a Hi Hello Generative AI solution augmented with the U.S. patent office data. By leveraging the Databricks Data Intelligence Platform, we significantly improved retrieval speed, response quality, and accuracy.— Denis Kamotsky, Principal Software Engineer, Corning
In the explosion of state-of-the-art foundation models, we’ve seen our customer base rapidly adopt new models: DBRX had a thousand customers experimenting with it within two weeks of launch, and we’re seeing multiple hundreds of customers experimenting with the recently released Llama3 models. Many enterprises find it difficult to support these newer models in their platform within a reasonable timeframe, and changes in prompt structures and querying interfaces make them difficult to implement. Furthermore, as enterprises open access to the latest and greatest models, people get excited and build a bunch of stuff, which can quickly snowball into a mess of governance issues. Common governance issues are rate limits being hit and impacting production applications, exploding costs as people run GenAI models on large tables, and data leakage concerns as PII is sent to third-party model providers. Today, we’re excited to announce new capabilities in AI Gateway for governance and a curated model catalog to enable model discovery. Features included are:
Databricks Model Serving is accelerating our AI-driven projects by making it easy to securely access and manage multiple SaaS and open models, including those hosted on or outside Databricks. Its centralized approach simplifies security and cost management, allowing our data teams to focus more on innovation and less on administrative overhead.— Greg Rokita, AVP, Technology at Edmunds.com
Databricks Mosaic AI empowers teams to build and collaborate on compound AI systems from a single platform with centralized governance and a unified interface to train, track, evaluate, swap, and deploy. By leveraging enterprise data, organizations can move from general knowledge to data intelligence. This evolution empowers organizations to get to more relevant insights faster.
We’re excited to see what innovations our customers build next!