SESSION
LLM Evaluation: Auditing Fine-Tuned LLMs for Guaranteed Output Quality
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Generative AI |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | AI/Machine Learning, GenAI/LLMs, MLFlow |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Information retrieval from E-commerce product data sheets is a complex challenge and can incur high costs if done manually. To perform this task, Mirakl developed an innovative solution that leverages the power of fine-tuned LLMs. Although LLMs have proven to have strong capabilities for various tasks, they are far from perfect. Trained mainly on next-token generation using a wide range of data, LLMs can suffer from incorrect generation caused at times by a lack of context in prompts (e.g., absence of CoT) or resemblance to very common sequences. In this session, we will cover:
- Qualitative evaluation: Language model quality metrics and hallucinations detection
- Use of MLflow to automate the evaluation and the monitoring of LLMs
- Iterative quality improvement through prompt engineering strategies and dataset refinement through curation
These methods allowed us to quickly iterate on prompts and fine-tuned models to make them production trustworthy.
SESSION SPEAKERS
Pierre Lourdelet
/Data Scientist
Mirakl
Loic Pauletto
/Data Scientist
Mirakl