SESSION

What’s Next for the Upcoming Apache Spark 4.0?

Register or Login

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark
SKILL LEVELIntermediate
DURATION40 min

The upcoming release of Apache Spark 4.0 delivers substantial enhancements that refine the functionality and augment the developer experience with the unified analytics engine. This presentation will highlight:

  • Spark Connect’s GA for enhanced usability and debuggability.
  • Structured Logging for better error analysis and streamlined debugging.
  • Significant PySpark updates, including python data source APIs, arrow-optimized UDFs, polymorphic Python UDTFs, and improved UDF profiling, aligning with pandas 2.x for complex data workflows.
  • Expanded SQL capabilities through ANSI SQL compliance, and new SQL Cache V2, UDF, and Collation support.
  • Enhanced connectivity with new native XML and Databricks connectors.
  • Improvements in real-time data processing with the Arbitrary State API v2 and State Data source reader for Structured Streaming.

 

Attendees will learn how to use Apache Spark 4.0's advancements for optimized data processing and analytics.

SESSION SPEAKERS

Xiao Li

/Engineering Director
Databricks

Wenchen Fan

/Staff Software Engineer
Databricks