Nicolas Poggi is an IT professional and researcher with a focus on performance and scalability of data-intensive applications. Nicolas leads a new research project on upcoming architectures for Big Data at the Barcelona Super Computing and Microsoft Research Joint Center in Barcelona. Nicolas combines a pragmatic approach to performance and scalability from his industry experience with research in server resource management and Machine Learning. Nicolas is a frequent speaker at and organizer for the Barcelona performance and operations community. He holds a PhD from BarcelonaTech (UPC).
In this talk, we present a comprehensive framework we developed at Databricks for assessing the correctness, stability, and performance of our Spark SQL engine. Apache Spark is one of the most actively developed open source projects, with more than 1200 contributors from all over the world. At this scale and pace of development, mistakes bound to happen. We will discuss various approaches we take, including random query generation, random data generation, random fault injection, and longevity stress tests. We will demonstrate the effectiveness of the framework, by highlighting several correctness issues we have found through random query generation and critical performance regressions we were able to diagnose within hours due to our automated benchmarking tools.