Dependency Management in Spark Connect: Simple, Isolated, Powerful
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Lakehouse Architecture |
INDUSTRY | Enterprise Technology, Professional Services, Financial Services |
TECHNOLOGIES | Apache Spark, Developer Experience |
SKILL LEVEL | Beginner |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user's code are complex tasks, significantly more so when dynamic support is required.
This session will explore the new session-based dependency management system in Spark Connect (introduced since Apache Spark™ 3.5.0), addressing the limitations of static dependency setups in distributed computing environments. We'll discuss leveraging the powerful Artifact API to deliver dynamic dependency updates during runtime while maintaining strict isolation across Spark Connect sessions. Through practical and comprehensive examples, learn how to create, package, utilize and update custom isolated environments ensuring flexible and seamless execution for both Python and Scala applications.
SESSION SPEAKERS
Hyukjin Kwon
/Staff Software Engineer
Databricks
Akhil Gudesa
/Software Engineer
Databricks