It can be a frustrating experience for an application developer when her application:(a) fails before completion,
(b) does not run quickly or efficiently, or
(c) does not produce correct results.
There are many reasons why such events happen. For example, Spark’s lazy evaluation, while excellent for performance, can make root-cause diagnosis hard. We are working closely with application developers to make diagnosis, tuning, and debugging of Spark applications easy. Our solution is based on holistic analysis and visualization of profiling information gathered from many points in the Spark stack: the program, the execution graph, counters, data samples from RDDs, time series of metrics exported by various end-points in Spark, YARN, as well as the OS, and others. Through a demo-driven walk-through of failed, slow, and incorrect applications taken from everyday use of Spark, we will show how such a solution can improve the productivity of Spark application developers tremendously.
Shivnath Babu is an Associate Professor of Computer Science at Duke University and the CTO at Unravel Data Systems. His research focuses on ease-of-use and manageability of data-intensive systems, automated problem diagnosis, and cluster sizing for applications running on cloud platforms. Shivnath co-founded Unravel to solve the application management challenges that companies face when they adopt systems like Hadoop and Spark. Unravel originated from the Starfish platform built at Duke which has been downloaded by over 100 companies. Shivnath has received a U.S. National Science Foundation CAREER Award, three IBM Faculty Awards, and an HP Labs Innovation Research Award.