Scaling GenAI Inference From Prototype to Production: Real-World Lessons in Speed & Cost
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Artificial Intelligence |
Industry | Education, Media and Entertainment |
Technologies | Delta Lake, Data Marketplace, Databricks Workflows |
Skill Level | Intermediate |
Duration | 20 min |
This lightning talk dives into real-world GenAI projects that scaled from prototype to production using Databricks’ fully managed tools. Facing cost and time constraints, we leveraged four key Databricks features—Workflows, Model Serving, Serverless Compute, and Notebooks—to build an AI inference pipeline processing millions of documents (text and audiobooks).
This approach enables rapid experimentation, easy tuning of GenAI prompts and compute settings, seamless data iteration and efficient quality testing—allowing Data Scientists and Engineers to collaborate effectively. Learn how to design modular, parameterized notebooks that run concurrently, manage dependencies and accelerate AI-driven insights.
Whether you're optimizing AI inference, automating complex data workflows or architecting next-gen serverless AI systems, this session delivers actionable strategies to maximize performance while keeping costs low.
Session Speakers
Anish Kumar
/Lead Engineer
Scribd