Productionizing a 24/7 Spark Streaming service on YARN - Databricks

Productionizing a 24/7 Spark Streaming service on YARN

Download Slides

At Ooyala we must process over two billion video events a day and provide rich, near real-time, and always-available analytics to thousands of customers. Spark Streaming is core to our state of the art ingestion pipeline. In developing this system we have encountered and resolved a large number of undocumented challenges which we would like to share: What are some of the challenges and lessons from productionizing a Spark Streaming pipeline over YARN? How do you ensure 24/7 availability and fault tolerance? What are the best practices for Spark Streaming and its integration with Kafka and YARN? How do you monitor and instrument the various stages of the pipeline? We will dive into all these topics and more.