Skip to main content
Page 1

Streamline AI Agent Evaluation with New Synthetic Data Capabilities

Our customers continue to shift from monolithic prompts with general-purpose models to specialized agent systems to achieve the quality needed to drive ROI...

Databricks announces significant improvements to the built-in LLM judges in Agent Evaluation

An improved answer-correctness judge in Agent Evaluation Agent Evaluation enables Databricks customers to define, measure, and understand how to improve the quality of...

Announcing Mosaic AI Agent Framework and Agent Evaluation

Databricks announced the public preview of Mosaic AI Agent Framework and Agent Evaluation alongside our Generative AI Cookbook at the Data + AI...

Lakehouse Monitoring: A Unified Solution for Quality of Data and AI

Introduction Databricks Lakehouse Monitoring allows you to monitor all your data pipelines – from data to features to ML models – without additional...

Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2

Today we're excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of...

Announcing Inference Tables: Simplified Monitoring and Diagnostics for AI models

Have you ever deployed an AI model, only to discover it's delivering unexpected results in a real-world setting? Monitoring models is as crucial...

Best Practices for LLM Evaluation of RAG Applications

Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLM). The retrieval...