SESSION

Dependency Management in Spark Connect: Simple, Isolated, Powerful

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Lakehouse Architecture
INDUSTRYEnterprise Technology, Professional Services, Financial Services
TECHNOLOGIESApache Spark, Developer Experience
SKILL LEVELBeginner
DURATION20

Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user's code are complex tasks, significantly more so when dynamic support is required.

 

This session will explore the new session-based dependency management system in Spark Connect (introduced since Apache Spark™ 3.5.0), addressing the limitations of static dependency setups in distributed computing environments. We'll discuss leveraging the powerful Artifact API to deliver dynamic dependency updates during runtime while maintaining strict isolation across Spark Connect sessions. Through practical and comprehensive examples, learn how to create, package, utilize and update custom isolated environments ensuring flexible and seamless execution for both Python and Scala applications.

SESSION SPEAKERS

Hyukjin Kwon

/Staff Software Engineer
Databricks

Akhil Gudesa

/Software Engineer
Databricks