Session

Scaling Generative AI: Batch Inference Strategies for Foundation Models

Overview

Experience	In Person
Type	Breakout
Track	Artificial Intelligence
Industry	Enterprise Technology
Technologies	MLFlow, Databricks SQL, Mosaic AI
Skill Level	Intermediate
Duration	40 min

Curious how to apply resource-intensive generative AI models across massive datasets without breaking the bank? This session reveals efficient batch inference strategies for foundation models on Databricks. Learn how to architect scalable pipelines that process large volumes of data through LLMs, text-to-image models and other generative AI systems while optimizing for throughput, cost and quality.

Key takeaways:

Implementing efficient batch processing patterns for foundation models using AI functions
Optimizing token usage and prompt engineering for high-volume inference
Balancing compute resources between CPU preprocessing and GPU inference
Techniques for parallel processing and chunking large datasets through generative models
Managing model weights and memory requirements across distributed inference tasks

You'll discover how to process any scale of data through your generative AI models efficiently.

Session Speakers

IMAGE COMING SOON

Ankit Mathur

/Engineering Lead, AI Serving
Databricks

IMAGE COMING SOON

Andrew Shieh

/Software Engineer
Databricks