Over the past three years, Spark has quickly grown from a research project to one of the most active open source projects in parallel computing. I’ll go through a summary of recent growth, highlighting key contributions from across the community. At the same time, much remains to be done to make big data analysis truly accessible and fast. I’ll sketch how we at Databricks are approaching this problem through our continuing work on Apache Spark, and the aspects of the system that we believe make Spark truly unique for big data.
Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow development effort at Databricks in addition to other aspects of the platform. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).