Spark Streaming extends the core Apache Spark to perform large-scale stream processing. It is being rapidly adopted by companies spread across various business verticals – ad monitoring, real-time analysis of machine data, anomaly detections, etc. This interest is due to its simple, high-level programming model, and its seamless integration with SQL querying (Spark SQL), machine learning algorithms (MLlib), etc. However, for building a real-time streaming analytics pipeline, its not sufficient to be able to easily express your business logic. Running the platform with high uptimes and continuously monitoring it has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, I am going to elaborate about various operational aspects of a Spark Streaming application at different stages of deployment – prototyping, testing, monitoring continuous operation, upgrading. In short, all the recipes that takes you from “hello-world” to large scale production in no time.
Tathagata Das is an Apache Spark committer and a member of the PMC. He's the lead developer behind Spark Streaming and currently develops Structured Streaming. Previously, he was a grad student in the UC Berkeley at AMPLab, where he conducted research about data-center frameworks and networks with Scott Shenker and Ion Stoica.