Daniel graduated from the University of Bath in 2019, with a Masters’ Degree in Integrated Mechanical and Electrical engineering. Since joining Mars Petcare as a data engineer in January 2020 he has been involved in designing and building out Gecko: A bespoke CCPA compliance tool to be used within the Petcare Data Platform.
November 17, 2020 04:00 PM PT
The increase in consumer data privacy laws brings continuing challenges to data teams all over the world which collect, store, and use data protected by these laws. The data engineering team at Mars Petcare is no exception, and in order to improve efficiency and accuracy in responding to these challenges they have built Gecko: an efficient, auditable, and simple CCPA compliance ecosystem designed for Spark and Delta Lake.
Gecko has allowed us to simultaneously achieve the following benefits within our data platform:
- Automatically handle consumer deletion requests in a compliant manner.
- Increase the overall security of PII data in the Petcare Data Platform (PDP) Data Lake.
- Maintain Non-PII data structure, in order to continue to provide analytical value and overall data integrity.
- Make PII data accessible when required.
These benefits have been achieved by a conceptually simple solution: using row (client) level encryption for all PII tables in our system, whilst storing the encryption keys in a single, highly secure location in our lake. By leveraging the power of Spark and Delta Lake, the Gecko ecosystem can carry out a full encryption of all personal data, automatically handle consumer data requests, and decrypt personal data when required for other engineering or analytical projects.
The process has the added benefit of generating a huge labelled training dataset containing all PII in the PDP, for future use in the design of a machine learning model for automatic PII detection. A tool such as this would then enable us to remove the risk of human error when labelling PII on ingestion, as well as enabling PII removal from free text fields.
This presentation will share:
- How the solution can achieve automated privacy rights requests and enhanced platform security.
- How Spark & Delta lake have been leveraged in these applications.
- Why these technologies have been essential in achieving the necessary requirements.
Speakers: Jason Hale and Daniel Harrington