The next release of Apache Spark will be 2.0, marking a big milestone for the project. In this talk, I’ll cover how the community has grown to reach this point, and some of the major features in 2.0. The largest additions are performance improvements for Datasets, DataFrames and SQL through Project Tungsten, as well as a new Structured Streaming API that provides simpler and more powerful stream processing. I’ll also discuss a bit of what’s in the works for future versions.
Matei Zaharia is an assistant professor of computer science at Stanford University and Chief Technologist at Databricks. He started the Spark project during his PhD at UC Berkeley in 2009. Before that, Matei worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science.