Beyond the Privacy-Utility Tradeoff: Differential Privacy in Tabular Data Synthesis
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Artificial Intelligence |
Industry | Enterprise Technology, Health and Life Sciences, Financial Services |
Technologies | Llama, PyTorch |
Skill Level | Intermediate |
As organizations increasingly leverage sensitive data for AI applications, generating high quality synthetic data with mathematical guarantees of privacy has become crucial. This talk explores the use of Gretel Navigator to generate differentially private synthetic data that maintains high fidelity to the source data and high utility on downstream tasks across heterogeneous datasets. Our analysis covers a framework for privacy-preserving synthetic data generation with two use cases: patient events and e-commerce reviews. We reveal nuanced strategies for: calibrating privacy parameters ε and δ for mixed-modal data, leveraging both record-level and user-level differential privacy depending on which entity in the dataset requires protection, maintaining statistical properties and high utility on downstream classification tasks under stringent privacy constraints (e.g., <0.05 difference in AUC when using DP), and quantifying resilience to membership inference and attribute inference attacks.
Session Speakers
IMAGE COMING SOON
Lipika Ramaswamy
/Gretel