PhuDuc Nguyen - Databricks

PhuDuc Nguyen

Consulting Engineer, Oracle Data Cloud

PhuDuc has been writing software for over 14 years and currently works as a Consulting Engineer at Oracle Data Cloud. He has been working in Big Data for over 5 years since the days of MapReduce, Pig, and Cascading. Over the last few years he’s transitioned several projects from MapReduce/Cascading to Spark Streaming with Kafka and Cassandra.



Auto Scaling Systems With Elastic Spark StreamingSummit East 2017

Come explore a feature we've created that is not supported out-of-the-box: the ability to add or remove nodes to always-on real time Spark Streaming jobs. Elastic Spark Streaming jobs can automatically adjust to the demands of traffic or volume. Using a set of configurable utility classes, these jobs scale down when lulls are detected and scale up when load is too high. We process multiple TB's per day with billions of events. Our traffic pattern experiences natural peaks and valleys with the occasional sustained unexpected spike. Elastic jobs has freed us from manual intervention, given back developer time, and has made a large financial impact through maximized resource utilization.

Related Articles:
  • Diving into Apache Spark Streaming’s Execution Model
  • Making Apache Spark the Fastest Open Source Streaming Engine