SESSION
Processing a Trillion Rows Per Day with Delta Lake at Adobe
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Lakehouse Architecture |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark, Delta Lake |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
This is an update on some data patterns and practices we at Adobe have adopted as we scale across 8000+ self-managed Delta tables. At Adobe Experience Platform, we ingest terabytes of data daily and manage petabytes of data for our customers as part of the Unified Profile offering. This helps power various marketing scenarios activated on multiple platforms and channels like email, advertisements, etc. We'll discuss:
- Scaling the Writer: Thousand Stream problem-managing thousands of Structured Streaming writers at scale
- JVM agnostic locking for partition level concurrency control
- Balancing Multi-Tenancy and Single Tenancy Transaction Management and tracking
- Using Append-Only DeltaTables to track global history at scale
- Anti-patterns used
- Data manipulation using UDFs Maintenance Operations and Their Scaling Gotchas Performance
SESSION SPEAKERS
Yeshwanth Vijayakumar
/Director Of Engineering
Adobe