Migrating and Optimizing Large-Scale Streaming Applications with Databricks (repeated)
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Media and Entertainment |
TECHNOLOGIES | Apache Spark, Developer Experience, Orchestration |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
This session is repeated.
Our large-scale streaming application processes hundreds of billions of ad events daily at over 5GB/s. It transforms, joins, and routes these ad events to hundreds of heterogeneous destinations, enabling real-time analytics, batch reporting, ML-based forecasting, and streaming ad log delivery for programmatic ad campaigns. In this session, we will discuss how we rearchitected, redeveloped, and migrated this massive application with over 30K lines of code to a Databricks Spark Structured Streaming architecture. We'll share lessons learned, cover the substantial benefits gained, and detail how we enhanced performance through various memory-related optimizations, Kinesis parameter tuning, parallelizing the output stage within each micro-batch, and other tweaks. We'll introduce FreeWheel, programmatic advertising, the architecture of the larger data platform that incorporates this streaming application, and our robust monitoring and observability solution. Finally, we'll highlight several Databricks features that enhanced our development experience, such as the Databricks AI assistant.
SESSION SPEAKERS
Sharif Doghmi
/Lead Software Engineer
FreeWheel, A Comcast Company
Donghui Li
/Lead Software Engineer
FreeWheel, A Comcast Company