From Imperative to Declarative Paradigm: Rebuilding a CI/CD Infrastructure Using Hatch and DABs
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Media and Entertainment |
Technologies | Apache Spark, Databricks Workflows |
Skill Level | Intermediate |
Duration | 40 min |
Building and deploying Pyspark pipelines to Databricks should be effortless.
However, our team at FreeWheel has, for the longest time, struggled with a convoluted and hard-to-maintain CI/CD infrastructure. It followed an imperative paradigm, demanding that every project implement custom scripts to build artifacts and deploy resources, and resulting in redundant boilerplate code and awkward interactions with the Databricks REST API.
We set our mind on rebuilding it from scratch, following a declarative paradigm instead. We will share how we were able to eliminate thousands of lines of code from our repository, create a fully configuration-driven infrastructure where projects can be easily onboarded, and improve the quality of our codebase using Hatch and Databricks Asset Bundles as our tools of choice. In particular, DAB has made deploying across our 3 environments a breeze, and has allowed us to quickly adopt new features as soon as they are released by Databricks.
Session Speakers
IMAGE COMING SOON
Luigi Di Tacchio
/Software Engineer
FreeWheel
IMAGE COMING SOON
Saswati Bhoi
/Sr. SRE
Comcast