A preview of upcoming Apache Spark™ 3.0 Launch - Databricks

Watch now!

Available On-Demand

Learn about the latest developments in the open-source community with Apache Spark 3.0 and DBR 7.0

The Apache Spark™ 3.0 release brings new capabilities and features to the Spark ecosystem. In this online tech talk from Databricks, we will walk through updates in the Apache Spark 3.0 release as part of our new Databricks Runtime 7.0 Beta, which is available today. Topics covered include:

  • The new Adaptive Query Execution (AQE) framework within Spark 3.0 can yield query performance gains. Based on a 3TB TPC-DS benchmark, two queries had more than a 1.5x speedup, and another 37 queries had more than 1.1x speedup.
  • With Dynamic Partition Pruning (DPP), we can significantly speed up performance by pruning partitions based on the joins between the fact and dimension tables common in star schema design.
  • Accelerator-aware Scheduling helps Spark take advantage of GPU and hardware accelerators for certain workloads (e.g deep learning). This release enhances the scheduler and makes the cluster manager accelerator-aware.
  • Spark 3.0 also introduces new Pandas UDF types and new Pandas function APIs for improved performance and usability.
  • Enhanced monitoring capabilities including the new UI for Structured Streaming, enhanced EXPLAIN command, and observable metrics.

Featured Speakers

Xiao Li, Engineering Manager, Open Source Spark
Denny Lee, Staff Developer Advocate

Register now to learn more about the latest contributions from the Spark community for fast and scalable data processing, as well as how you can try them out today on Databricks for free.

Watch on Demand Now!