Lazy Join Optimizations Without Upfront Statistics

Download Slides

Modern Data-Intensive Scalable Computing (DISC) systems such as Apache Spark do not support sophisticated cost-based query optimizers because they are specifically designed to process data that resides in external storage systems (e.g. HDFS), or they lack the necessary data statistics. Consequently, many crucial optimizations, such as join order and plan selection, are presently out-of-scope in these DISC system optimizers. Yet, join order is one of the most important decisions a cost-optimizer can make because wrong orders can result in a query response time that can become more than an order-of-magnitude slower compared to the better order. Session hastag: #SFr4

« back
About Matteo Interlandi

Matteo Interlandi is a Senior Scientist in the Gray Systems Lab (GSL) at Microsoft, working on scalable Machine Learning systems. Before Microsoft, he was a Postdoctoral Scholar in the CS Department at the University of California, Los Angeles, working on Big Data systems. Prior to joining UCLA, he was a researcher at the Qatar Computing Research Institute, and at the Institute for Human and Machine Cognition. He obtained his PhD in Computer Science from the University of Modena and Reggio Emilia.