Modern Config Driven ELT Framework for Building a Data Lake

May 26, 2021 03:50 PM (PT)

At Northwestern Mutual, we are using Spark on Databricks to perform Extract Load Transform (ELT) workloads. We built a configuration-driven python framework that lands data from various source systems and transforms it using Databricks Delta SQL. The framework bakes in consistency, performance, and access control while allowing our developers to leverage their existing SQL skillsets.  With this framework, our developers spend less time creating and configuring spark jobs with minimal code required.     

The framework ingests a list of job items from a JSON configuration file, each with a command that generates a dataframe and a list of any number of destinations to write the dataframe to. These commands and destinations are specified by type in the configuration, accompanied by command-specific attributes and another file if required, like a SQL file. We can also ensure certain best-practices are followed using these configurable commands and destinations, such as ensuring we are securing PII data in our destinations, ensuring data is saved in the correct locations, and connecting to valid sources when we retrieve data for the environment the job is run in.   

Our key focus for this session will be:

  • The need for a configuration-driven ELT framework
  • Architectural and design of the framework 
  • Configuration options and how they can be extended  
  • The security and consistency needs the framework helps us meet, such as securing PII
In this session watch:
Fred Kimball, Software Engineer, Northwestern Mutual
Josh Reilly, Developer, Northwestern Mutual

 

Fred Kimball

Fred Kimball is a Software Engineer at Northwestern Mutual. His responsibilities include building, maintaining, and securing data infrastructure, creating automated build and deployment pipelines, and...
Read more

Josh Reilly

Josh Reilly is a Lead Software Engineer at Northwestern Mutual. His role is to provide architectural direction as well as enable his teams to be successful through mentoring and the creation of librar...
Read more