Session
Beyond the Privacy-Utility Tradeoff: Differential Privacy in Tabular Data Synthesis
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Artificial Intelligence |
Industry | Enterprise Technology, Health and Life Sciences, Financial Services |
Technologies | Llama, PyTorch |
Skill Level | Intermediate |
Duration | 20 min |
As organizations increasingly leverage sensitive data for AI applications, generating high-quality synthetic data with mathematical guarantees of privacy has become crucial. This talk explores the use of NeMo Safe Synthetics to generate differentially private synthetic data that maintains high fidelity to the source data and high utility on downstream tasks across heterogeneous datasets. Our analysis presents a framework for privacy-preserving synthetic data generation with two use cases: e-commerce reviews and doctor’s notes. We reveal nuanced strategies for:
- Calibrating privacy parameters ε and δ for mixed text and tabular data
- Maintaining statistical properties and high utility on downstream classification tasks under stringent privacy constraints (e.g. <0.05 difference in AUC when using DP)
- Quantifying resilience to membership inference and attribute inference attacks
Session Speakers
IMAGE COMING SOON
Lipika Ramaswamy
/Research Scientist
NVIDIA