Auto Scaling Systems With Elastic Spark Streaming - Databricks

Auto Scaling Systems With Elastic Spark Streaming

Download Slides

Come explore a feature we’ve created that is not supported out-of-the-box: the ability to add or remove nodes to always-on real time Spark Streaming jobs. Elastic Spark Streaming jobs can automatically adjust to the demands of traffic or volume. Using a set of configurable utility classes, these jobs scale down when lulls are detected and scale up when load is too high. We process multiple TB’s per day with billions of events. Our traffic pattern experiences natural peaks and valleys with the occasional sustained unexpected spike. Elastic jobs has freed us from manual intervention, given back developer time, and has made a large financial impact through maximized resource utilization.

Learn more:

  • Diving into Apache Spark Streaming’s Execution Model
  • Making Apache Spark the Fastest Open Source Streaming Engine
  • About PhuDuc Nguyen

    PhuDuc has been writing software for over 14 years and currently works as a Consulting Engineer at Oracle Data Cloud. He has been working in Big Data for over 5 years since the days of MapReduce, Pig, and Cascading. Over the last few years he's transitioned several projects from MapReduce/Cascading to Spark Streaming with Kafka and Cassandra.