Retrieval Augmented Generation (RAG)

Create high-quality generative AI deployments using RAG with Databricks

The Big Book of Generative AI

Best practices for building production-quality GenAI applications

Retrieval augmented generation (RAG) is a generative AI application pattern that finds data/documents relevant to a question or task and provides them as context for the large language model (LLM) to give more accurate responses.

Databricks has a suite of RAG tools that helps you combine and optimize all aspects of the RAG process such as data preparation, retrieval models, language models (either SaaS or open source), ranking and post-processing pipelines, prompt engineering, and training models on custom enterprise data.

Access to open source and proprietary SaaS models

With Databricks, you can deploy, monitor, govern and query any generative AI model. All popular models like LangChain, Llama 2, MPT and BGE, and models on Azure OpenAI, Amazon Bedrock, Amazon SageMaker and Anthropic can be managed and governed in Model Serving, making it easy to experiment with and productionize models to find the best candidate for your RAG application.

Automated real-time pipelines for any type of data

Databricks natively supports serving and indexing your data for online retrieval. For unstructured data (text, images and video), Vector Search automatically indexes and serves data, making them accessible for RAG applications without needing to create separate data pipelines. Under the hood, Vector Search manages failures, handles retries and optimizes batch sizes to provide you with the best performance, throughput and cost. For structured data, Feature and Function Serving provides millisecond-scale queries of contextual data, such as user or account data, that enterprises often want to inject into prompts in order to customize them based on user information.

Move RAG applications quickly to production

Databricks makes it easy to deploy, govern, query and monitor large language models fine-tuned or pre-deployed by Databricks or from any other model provider. Databricks Model Serving will handle automated container build and infrastructure management to reduce maintenance costs and speed up deployment.

Governance built in

Databricks has security, governance and monitoring built in. RAG applications will have fine-grained access controls on data and models. You can set rate limits and track lineage across all models. This ensures the RAG application won’t expose confidential data to users who shouldn’t have access.

Ensure quality and safety in production

To meet the standard of quality required for customer-facing applications, AI output must be accurate, current, aware of your enterprise context, and safe. Databricks makes it easy to understand model quality with LLM automated evaluation, improving the helpfulness, relevance and accuracy of RAG chatbot responses. Lakehouse Monitoring automatically scans application outputs for toxic, hallucinated or otherwise unsafe content. This data can then feed dashboards, alerts or other downstream data pipelines for subsequent actioning.

Resources