Session

Beyond the Privacy-Utility Tradeoff: Differential Privacy in Tabular Data Synthesis

Overview

ExperienceIn Person
TypeLightning Talk
TrackArtificial Intelligence
IndustryEnterprise Technology, Health and Life Sciences, Financial Services
TechnologiesLlama, PyTorch
Skill LevelIntermediate
Duration20 min

As organizations increasingly leverage sensitive data for AI applications, generating high-quality synthetic data with mathematical guarantees of privacy has become crucial. This talk explores the use of NeMo Safe Synthetics to generate differentially private synthetic data that maintains high fidelity to the source data and high utility on downstream tasks across heterogeneous datasets. Our analysis presents a framework for privacy-preserving synthetic data generation with two use cases: e-commerce reviews and doctor’s notes. We reveal nuanced strategies for:

  • Calibrating privacy parameters ε and δ for mixed text and tabular data
  • Maintaining statistical properties and high utility on downstream classification tasks under stringent privacy constraints (e.g. <0.05 difference in AUC when using DP)
  • Quantifying resilience to membership inference and attribute inference attacks

 

Session Speakers

IMAGE COMING SOON

Lipika Ramaswamy

/Research Scientist
NVIDIA