How Adobe uses Structured Streaming at Scale

May 26, 2021 04:25 PM (PT)

Download Slides

Adobe’s Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard earned lessons and as we reached this scale specifically with Structured Streaming.

 

Know thy Lag

  • While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.

Reading Data In

  • Fan Out Pattern using minPartitions to Use Kafka Efficiently
  • Overload protection using maxOffsetsPerTrigger
  • More Apache Spark Settings used to optimize Throughput

MicroBatching Best Practices

  • Map() +ForEach() vs MapPartitons + forEachPartition

Adobe Spark Speculation and its Effects

Calculating Streaming Statistics

  • Windowing
    • Importance of  the State Store
    • RocksDB FTW
  • Broadcast joins
  • Custom Aggegators
  • OffHeap Counters using Redis
    • Pipelining
In this session watch:
Yeshwanth Vijayakumar, Sr Engineering Manager, Adobe, Inc.

 

Yeshwanth Vijayakumar

I am a Sr Engineering Manager/Architect on the Unified Profile Team in the Adobe Experience Platform; it’s a PB scale store with a strong focus on millisecond latencies and Analytical abilities and ...
Read more