Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Artificial Intelligence |
Industry | Enterprise Technology, Manufacturing, Financial Services |
Technologies | MLFlow, Mosaic AI |
Skill Level | Intermediate |
Duration | 40 min |
In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows.
EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations.
By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.
Session Speakers
Arthur Dooner
/Senior Specialist Solutions Architect
Databricks
IMAGE COMING SOON
Wenwen Xie
/Specialist Solutions Architect
Databricks