In this talk, we will highlight major efforts happening in the Spark ecosystem. In particular, we will dive into the details of adaptive and static query optimizations in Spark 3.0 to make Spark easier to use and faster to run. We will also demonstrate how new features in Koalas, an open source library that provides Pandas-like API on top of Spark, helps data scientists gain insights from their data quicker.
Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.
Brooke Wenig is a Machine Learning Practice Lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teach courses on distributed machine learning best practices. She is a co-author of Learning Spark, 2nd Edition, co-instructor of the Distributed Computing with Spark SQL Coursera course, and co-host of the Data Brew podcast. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.
Burak Yavuz is a Software Engineer and Apache Spark committer at Databricks. He has been developing Structured Streaming and Delta Lake to simplify the lives of Data Engineers. Burak received his MS in Management Science & Engineering at Stanford and his BS in Mechanical Engineering at Bogazici University, Istanbul.