The Beauty of (Big) Data Privacy Engineering

Download Slides

Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy. In this talk, I will present our recent work on anonymization and privacy preserving analytics on large scale geo location datasets. In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.

In production, we have successfully achieved 1500+ times enhancements in terms of geo location anonymization, and 10+ times enhancements on nearest neighbour search based on anonymized geo datasets.

Speaker: Yangcheng Huang

Watch more Data + AI sessions here
Try Databricks for free
« back
About Yangcheng Huang


Yangcheng Huang is currently Director of Software Engineering, Data & Analytics at Truata. He is responsible for Truata's core platforms for risk assessment, anonymization and privacy preserving analytics, combining cutting-edge big data engineering, privacy-preserving machine learning algorithms with multi-cloud technologies. His interests include data privacy, big data analytics, data mining and machine learning. He has published 33 patents and 30+ peer reviewed research papers. Yangcheng holds a B.Eng. and a M.Eng. from Xi'an Jiao Tong University, China, and a PhD from University College London (UCL), UK.