Session
Scaling Generative AI: Batch Inference Strategies for Foundation Models
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Artificial Intelligence |
Industry | Enterprise Technology |
Technologies | MLFlow, Databricks SQL, Mosaic AI |
Skill Level | Intermediate |
Duration | 40 min |
Curious how to apply resource-intensive generative AI models across massive datasets without breaking the bank? This session reveals efficient batch inference strategies for foundation models on Databricks. Learn how to architect scalable pipelines that process large volumes of data through LLMs, text-to-image models and other generative AI systems while optimizing for throughput, cost and quality.
Key takeaways:
- Implementing efficient batch processing patterns for foundation models using AI functions
- Optimizing token usage and prompt engineering for high-volume inference
- Balancing compute resources between CPU preprocessing and GPU inference
- Techniques for parallel processing and chunking large datasets through generative models
- Managing model weights and memory requirements across distributed inference tasks
You'll discover how to process any scale of data through your generative AI models efficiently.
Session Speakers
IMAGE COMING SOON
Ankit Mathur
/Engineering Lead, AI Serving
Databricks
IMAGE COMING SOON
Andrew Shieh
/Software Engineer
Databricks