by Naveen Rao, Matei Zaharia and Patrick Wendell
The proof of concept (POC) of any new technology often starts with large, monolithic units that are difficult to characterize. By definition, POCs are designed to show that a technology works without considering issues around extensibility, maintenance, and quality. However, once technologies achieve maturity and are deployed widely, these needs drive product development to be broken down into smaller, more manageable units. This is the fundamental concept behind systems thinking and why we are seeing AI implementation move from models to AI agent systems.
The concept of modular design has been applied to:
Virtually every engineered system matures into modular, composable units that can be independently verified and connected. While 50 years ago software could be implemented as a single stream of commands, this is almost unthinkable in a modern developer environment. Software engineering evolved practices to manage complexity that resulted in portable, extensible, maintainable code. Today, developers divide problems into manageable subunits with well-defined interfaces between them. Functionality can be compartmentalized; modification of a component does not require changes to the entire system. As long as a component correctly services its interfaces to other modules, the integrated system will still work as intended. This composability allows extensibility; components can be composed in new ways or with new components to build different systems.
Large language models (LLMs) have been in a monolithic regime until recently; inputting new training data often required full retraining of the model, and the impact of customizations was difficult to characterize. Early on, LLMs were unreliable, inscrutable units; it was unclear when their output relied on supplied verified data or was already present in the training data. This “black box” output made them ill-suited for enterprise applications that require a high degree of control, reliability, and predictability for customer-facing applications. In addition, regulated industries have legal and compliance frameworks to which interactions with customers must conform. For instance, healthcare systems are required to provide healthcare data to patients, but there are restrictions on the interpretation of that data for patients. By separating the retrieval of data from its interpretation, healthcare systems can qualify correctness of data separately from correctness of interpretation. Agent AI systems give organizations the ability to parcel out different functions and control each of these functions separately. One such function is giving these systems deterministic access to data (calling functions or incorporating databases) that forms a foundation for all the responses. In the above scenarios, the desire is to provide a set of data as a source of ground truth for ALL responses from the system.
These requirements necessitate a new way to build end-to-end intelligence applications. Earlier this year, we introduced the concept of compound AI systems (CAIS) in a blog post published by the Berkeley AI Research department. AI agent systems apply the concept of CAIS and modular design theory to real-world AI systems development. AI agent systems use multiple components (including models, retrievers, and vector databases) as well as tools for evaluation, monitoring, security, and governance. These multiple interacting components offer much higher quality outputs than a single-mode foundation model and enable AI developers to deploy independently verifiable components that are easier to maintain and update. We are now seeing large AI labs like OpenAI move in this direction: ChatGPT can access the internet through a tools interface, and their latest reasoning model, O1, has multiple interacting components in its reasoning chain.
In contrast to standard application software, intelligence applications have probabilistic components and deterministic components that must interact in predictable ways. Human inputs are inherently ambiguous; LLMs have now given us the ability to use context to interpret the intent of a request and convert this into something more deterministic. To service the request, it might be necessary to retrieve specific facts, execute code, and apply a reasoning framework based on previously learned transformation. All of this information must be reassembled into a coherent output that is formatted correctly for whomever (or whatever) will consume it. Modularizing allows the developer to separate the parts of the application that are completely deterministic (such as database lookups or calculators), partially ambiguous (such as contextual processing of a prompt), and completely creative (rendering new designs or novel prose).
Most intelligence applications will have these logical components:
At Databricks, we have created the Mosaic AI Agent Framework to make it easy to build these end-to-end systems. This framework can be used to define evaluation criteria for a system and score its quality for the given application. The Mosaic AI Gateway provides access controls, rate limiting, payload logging, and guardrails (filtering for system inputs and outputs). The gateway gives the user constant monitoring of running systems to monitor for safety, bias, and quality.
Today, the typical components of an AI agent system are:
We have already seen customers taking advantage of this modularity to drive better end-to-end quality and maintainability of intelligence applications. As an example, Factset provides financial data, analytics, and software solutions for investment and corporate professionals. They created their own query language, known as FQL, to structure queries on their data. They wanted to add an English-language interface to their platform while maintaining a high quality of information output. By using a combination of fine-tuning, Vector Search, and prompting, they were able to deploy their AI agent system to production.
We see AI agent systems as the vanguard of a novel application development paradigm for intelligence applications. Moving from monolithic, unmaintainable LLMs to a modular, customizable approach is a natural progression that comes with many advantages: higher reliability, easier maintainability, and greater extensibility. Databricks provides the fabric to sew together these applications in a unified platform with the necessary monitoring and governing structures for enterprise needs. Developers who learn to wield these tools for their organizations will have a distinct advantage in building quality applications quickly.