Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire!

Flare: Scale up Spark SQL with native compilation and set your data on fire! Spark performance on SQL and DataFrame/DataSet workloads has made impressive progress thanks to Catalyst and Tungsten, but there is still a significant gap towards what is achievable by best-of-breed query engines or hand-written low-level C code, on modern server-class hardware. We present Flare, a new experimental back-end for Spark SQL that yields significant speedups by compiling Catalyst query plans to native code. Flare’s low-level implementation takes full advantage of native execution, using techniques such as NUMA-aware scheduling and data layouts to leverage ‘mechanical sympathy’ and bring execution closer to the metal than current JVM-based techniques on big memory machines. Thus, with available memory increasingly in the TB range, Flare makes scale-up on server-class hardware an interesting alternative to scaling out across a cluster, especially in terms of data center costs. This talk will discuss the design of Flare and will demonstrate experiments on standard SQL benchmarks that exhibit order of magnitude speedups over Spark 2.0.

About Tiark Rompf

Tiark Rompf is an Assistant Professor at Purdue University. His work focuses on advanced compiler technology for big data systems, and associated language support. From 2008 to 2014 he was a member of the Scala team at EPFL, where he made various contributions to the Scala language and toolchain (delimited continuations, efficient immutable data structures, compiler speedups, type system work). From 2012 to 2014 he was a Principal Researcher at Oracle Labs. His work has been featured as Research Highlight in CACM, received a Best Paper Award at VLDB, and an NSF CAREER Award..