2015 was a year of continued growth for Spark, with numerous additions to the core project and very fast growth of use cases across the industry. In this talk, I’ll look back at how the Spark community is has grown and changed in 2015, based on a large Apache Spark user survey conducted by Databricks. We see some interesting trends in the diversity of runtime environments (which are increasingly not just Hadoop); the types of applications run on Spark; and the types of users, now that features like R support and DataFrames are available in Spark. I’ll also cover the ongoing work in the upcoming releases of Spark to support new use cases.
Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow development effort at Databricks. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award and several best paper awards.