Model Alignment at Scale using RL from AI Feedback on Databricks
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Generative AI |
INDUSTRY | Media and Entertainment, Retail and CPG - Food, Financial Services |
TECHNOLOGIES | AI/Machine Learning, GenAI/LLMs, MLFlow |
SKILL LEVEL | Advanced |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Refining large language models to meet specific business objectives can be challenging. Traditional techniques such as on-the-fly tuning and supervised fine-tuning often fail to adapt LLMs to unique requirements, such as adherence to a strict code of conduct or serving niche markets. To address this, we'll show how Reinforcement Learning from AI Feedback (RLAIF) can be applied on Databricks using an open LLM as a reward model, minimizing the need for extensive human intervention in the ranking of outputs. In our session, we'll explore the structure of RLAIF, its practical use, and its advantages over traditional RLHF, including cost efficiency and operational simplicity. We'll back up our discussion with a demo showing how RLAIF effectively aligns LLMs with business-specific requirements in a simple use case. We'll conclude the session by summarizing the key takeaways and offering a perspective on the future of model alignment at scale.
SESSION SPEAKERS
Michael Shtelma
/Lead Specialist Solutions Architect
Databricks