SESSION

Processing a Trillion Rows Per Day with Delta Lake at Adobe

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Lakehouse Architecture
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, Delta Lake
SKILL LEVELIntermediate
DURATION20

This is an update on some data patterns and practices we at Adobe have adopted as we scale across 8000+ self-managed Delta tables. At Adobe Experience Platform, we ingest terabytes of data daily and manage petabytes of data for our customers as part of the Unified Profile offering. This helps power various marketing scenarios activated on multiple platforms and channels like email, advertisements, etc. We'll discuss:

 

  • Scaling the Writer: Thousand Stream problem-managing thousands of Structured Streaming writers at scale
  • JVM agnostic locking for partition level concurrency control
  • Balancing Multi-Tenancy and Single Tenancy Transaction Management and tracking
  • Using Append-Only DeltaTables to track global history at scale
  • Anti-patterns used
  • Data manipulation using UDFs Maintenance Operations and Their Scaling Gotchas Performance

SESSION SPEAKERS

Yeshwanth Vijayakumar

/Director Of Engineering
Adobe