Session

Scaling Generative AI: Batch Inference Strategies for Foundation Models

Overview

ExperienceIn Person
TypeBreakout
TrackArtificial Intelligence
IndustryEnterprise Technology
TechnologiesMLFlow, Databricks SQL, Mosaic AI
Skill LevelIntermediate
Duration40 min

Curious how to apply resource-intensive generative AI models across massive datasets without breaking the bank? This session reveals efficient batch inference strategies for foundation models on Databricks. Learn how to architect scalable pipelines that process large volumes of data through LLMs, text-to-image models and other generative AI systems while optimizing for throughput, cost and quality.

 

Key takeaways:

  • Implementing efficient batch processing patterns for foundation models using AI functions
  • Optimizing token usage and prompt engineering for high-volume inference
  • Balancing compute resources between CPU preprocessing and GPU inference
  • Techniques for parallel processing and chunking large datasets through generative models
  • Managing model weights and memory requirements across distributed inference tasks

 

You'll discover how to process any scale of data through your generative AI models efficiently.

Session Speakers

IMAGE COMING SOON

Ankit Mathur

/Engineering Lead, AI Serving
Databricks

IMAGE COMING SOON

Andrew Shieh

/Software Engineer
Databricks