Towards a Unified Data Analytics Optimizer - Databricks

Towards a Unified Data Analytics Optimizer

Download Slides

Today’s big data analytics systems are best effort only: despite the wide adoption, they still lack the ability to take user monetary constraints and performance goals, and automatically configure an analytic job to achieve those goals. Our work aims to take a step further towards building a new data analytics optimizer that works for arbitrary dataflow programs and determines the job configuration in an automated manner based on user objectives regarding latency, throughput, monetary cost, etc.

At the core of the optimizer are a principled multi-objective optimization framework that enables one to explore the tradeoffs between different objectives, and a deep learning-based modeling approach that can learn a model for each user objective as complex as necessary for the user computing environment. Using both SQL-like and machine learning jobs in Spark, we show that our techniques can learn a model of each objective with high accuracy, and the multi-objective optimizer can automatically recommend new configurations that significantly improve performance from the configurations manually set by engineers.

Session hashtag: #SAISDev18

About Yanlei Diao

Yanlei Diao is Professor of Computer Science at Ecole Polytechnique and is also a tenured professor at the University of Massachusetts Amherst, USA. Her research interests lie in database systems and big data analytics. She received her PhD in Computer Science from the University of California, Berkeley. Prof. Diao was a recipient of the 2016 ERC Consolidator Award, 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year for outstanding contributions), IBM Scalable Innovation Faculty Award, and NSF Career Award.