2017 continues to be an exciting year for Apache Spark. I will talk about new updates in two major areas in the Spark community this year: stream processing with Structured Streaming, and deep learning with high-level libraries such as Deep Learning Pipelines and TensorFlowOnSpark. In both areas, the community is making powerful new functionality available in the same high-level APIs used in the rest of the Spark ecosystem (e.g., DataFrames and ML Pipelines), and improving both the scalability and ease of use of stream processing and machine learning.
Matei Zaharia is an assistant professor of computer science at Stanford University and Chief Technologist at Databricks. He started the Spark project during his PhD at UC Berkeley in 2009. Before that, Matei worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science.
Sue Ann is a software engineer on the machine learning team at Databricks. Before Databricks, she worked at Facebook on Ads Targeting and Commerce. Sue Ann holds a PhD in computer science, specializing in machine learning from Carnegie Mellon University.