Methods for Evaluating Your GenAI Application Quality
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Generative AI |
TECHNOLOGIES | Databricks Experience (DBX), AI/Machine Learning, GenAI/LLMs, MLFlow |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Ensuring the quality and reliability of Generative AI applications in production is paramount. This session dives into the comprehensive suite of tools provided by Databricks, including inference tables, Lakehouse Monitoring, and MLflow to facilitate rigorous evaluation and quality assurance of model responses. Discover how to harness these components effectively to conduct both offline evaluations and real-time monitoring, ensuring your GenAI applications meet the highest standards of performance and reliability.
We'll explore best practices for using LLMs as judges to assess response quality, integrating MLflow for tracking experiments and model versions, and leveraging the unique capabilities of inference tables and Lilac for enhanced model management and evaluation. You'll learn how to optimize your workflow and also ensure your GenAI applications are robust, scalable, and aligned with your production goals.
SESSION SPEAKERS
Alkis Polyzotis
/Senior Staff Software Engineer
Databricks
Michael Carbin
/Principal Researcher
Databricks