SESSION
What’s Next for the Upcoming Apache Spark 4.0?
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
The upcoming release of Apache Spark 4.0 delivers substantial enhancements that refine the functionality and augment the developer experience with the unified analytics engine. This presentation will highlight:
- Spark Connect’s GA for enhanced usability and debuggability.
- Structured Logging for better error analysis and streamlined debugging.
- Significant PySpark updates, including python data source APIs, arrow-optimized UDFs, polymorphic Python UDTFs, and improved UDF profiling, aligning with pandas 2.x for complex data workflows.
- Expanded SQL capabilities through ANSI SQL compliance, and new SQL Cache V2, UDF, and Collation support.
- Enhanced connectivity with new native XML and Databricks connectors.
- Improvements in real-time data processing with the Arbitrary State API v2 and State Data source reader for Structured Streaming.
Attendees will learn how to use Apache Spark 4.0's advancements for optimized data processing and analytics.
SESSION SPEAKERS
Xiao Li
/Engineering Director
Databricks
Wenchen Fan
/Staff Software Engineer
Databricks