Automated Metadata Management in Data Lake – A CI/CD Driven Approach

May 26, 2021 05:00 PM (PT)

Download Slides

We as data engineers are aware of trade off’s between development speed, metadata governance and schema evolution (or restriction) in rapidly evolving organization. Our day to day activities involve adding/removing/updating tables, protecting PII Information, curating and exposing data to our consumers. While our data lake keeps growing exponentially, there is equal increase in our downstream consumers. Struggle is to maintain balance between quickly promoting metadata changes with robust validation for downstream systems stability. In relational world DDL, DML changes can be managed through numerous options available for every kind of database from the vendor or 3rd party. As engineers we developed a tool which uses centralized git managed repository of data schemas in yml structure with ci/cd capabilities which maintains stability of our data lake and downstream systems.

In this presentation Northwestern Mutual Engineers, will discuss how they designed and developed new end-to-end ci/cd driven metadata management tool to make introduction of new tables/views, managing access requests etc in a more robust, maintainable and scalable way, all with only checking in yml files. This tool can be used by people who have no or minimal knowledge of spark.   

Key focus will be: 

  • Need for metadata management tool in a data lake
  • Architecture and Design of the tool 
  • Maintaining information on databases/tables/views like schema, owner, PII, description etc in simple to understand yml structure 
  • Live demo of creating a new table with CI/CD promotion to production
In this session watch:
Josh Reilly, Developer, Northwestern Mutual
Keyuri Shah, Lead Software Engineer, Northwestern Mutual

 

Josh Reilly

Josh Reilly is a Lead Software Engineer at Northwestern Mutual. His role is to provide architectural direction as well as enable his teams to be successful through mentoring and the creation of librar...
Read more

Keyuri Shah

Keyuri Shah

Keyuri Shah has 13+ years of good experience in IT, with diversified companies. Her heart and mind expertise in designing, prototyping, building and deploying scalable data processing pipelines on dis...
Read more