Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology |
Technologies | Apache Spark, Delta Lake |
Skill Level | Intermediate |
Duration | 40 min |
Adobe’s Real-Time Customer Data Platform relies on the identity graph to connect over 70 billion identities and deliver personalized experiences. This session will showcase how the platform leverages Databricks, Spark Streaming and Delta Lake, along with 25+ Databricks deployments across multiple regions and clouds — Azure & AWS — to process terabytes of data daily and handle over a million records per second.
The talk will highlight the platform’s ability to scale, demonstrating a 10x increase in ingestion pipeline capacity to accommodate peak traffic during events like the Super Bowl. Attendees will learn about the technical strategies employed, including migrating from Flink to Spark Streaming, optimizing data deduplication, and implementing robust monitoring and anomaly detection. Discover how these optimizations enable Adobe to deliver real-time identity resolution at scale while ensuring compliance and privacy.
Session Speakers
Akanksha Nagpal
/Sr. Data Engineer
Adobe
Samarth Lad
/Sr Software Engineer
Adobe