Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. We will talk about the exciting new developments in the Spark 3.0 as well as some other major initiatives that are coming in the future. In this talk, we want to share with the community many of the more important changes with the examples and demos.
The following features are covered: accelerator-aware scheduling, adaptive query execution, dynamic partition pruning, join hints, new query explain, better ANSI compliance, observable metrics, new UI for structured streaming, new UDAF and built-in functions, new unified interface for Pandas UDF, and various enhancements in the built-in data sources [e.g., parquet, ORC and JDBC].
Xiao Li is an engineering manager, Apache Spark Committer and PMC member at Databricks. His main interests are on Spark SQL, data replication and data integration. Previously, he was an IBM master inventor and an expert on asynchronous database replication and consistency verification. He received his Ph.D. from University of Florida in 2011.
Wenchen Fan is a software engineer at Databricks, working on Spark Core and Spark SQL. He mainly focuses on the Apache Spark open source community, leading the discussion and reviews of many features/fixes in Spark. He is a Spark committer and a Spark PMC member.