Session

Beyond the Privacy-Utility Tradeoff: Differential Privacy in Tabular Data Synthesis

Overview

Tuesday

June 10

3:40 pm

Experience	In Person
Type	Lightning Talk
Track	Artificial Intelligence
Industry	Enterprise Technology, Health and Life Sciences, Financial Services
Technologies	Llama, PyTorch
Skill Level	Intermediate
Duration	20 min

As organizations increasingly leverage sensitive data for AI applications, generating high-quality synthetic data with mathematical guarantees of privacy has become crucial. This talk explores the use of Gretel Safe Synthetics (now part of NVIDIA) to generate differentially private synthetic data that maintains high fidelity to the source data and high utility on downstream tasks across heterogeneous datasets. Our analysis presents a framework for privacy-preserving synthetic data generation with two use cases: e-commerce reviews and doctor’s notes. We reveal nuanced strategies for:

Calibrating privacy parameters ε and δ for mixed text and tabular data
Maintaining statistical properties and high utility on downstream classification tasks under stringent privacy constraints (e.g. <0.05 difference in AUC when using DP)
Quantifying resilience to membership inference and attribute inference attacks

Session Speakers

Lipika Ramaswamy

/Research Scientist
NVIDIA