Automating Data Quality Processes at Reckitt

May 26, 2021 12:05 PM (PT)

Download Slides

Reckitt is a fast-moving consumer goods company with a portfolio of famous brands and over 30k employees worldwide. With that scale small projects can quickly grow into big datasets, and processing and cleaning all that data can become a challenge. To solve that challenge we have created a metadata driven ETL framework for orchestrating data transformations through parametrised SQL scripts. It allows us to create various paths for our data as well as easily version control them. The approach of standardising incoming datasets and creating reusable SQL processes has proven to be a winning formula. It has helped simplify complicated landing/stage/merge processes and allowed them to be self-documenting.

 

But this is only half the battle, we also want to create data products. Documented, quality assured data sets that are intuitive to use. As we move to a CI/CD approach, increasing the frequency of deployments, the demand of keeping documentation and data quality assessments up to date becomes increasingly challenging. To solve this problem, we have expanded our ETL framework to include SQL processes that automate data quality activities. Using the Hive metastore as a starting point, we have leveraged this framework to automate the maintenance of a data dictionary and reduce documenting, model refinement, testing data quality and filtering out bad data to a box filling exercise. In this talk we discuss our approach to maintaining high quality data products and share examples of how we automate data quality processes.

In this session watch:
Richard Chadwick, Analyst ,
Karol Sawicz, Analyst, Reckitt

 

Richard Chadwick

Richard Chadwick is a data engineering consultant at Cervello, a Kearney Company, where he works with enterprise clients to develop data products using the latest cloud technologies. Richard holds a M...
Read more

Karol Sawicz

Karol Sawicz is an IT Business Analyst at RB, where he is involved in various data projects with responsibilities ranging from building full end to end reporting solutions as well as project managing ...
Read more